Search Results: "gor"

2 March 2024

Ravi Dwivedi: Malaysia Trip

Last month, I had a trip to Malaysia and Thailand. I stayed for six days in each of the countries. The selection of these countries was due to both of them granting visa-free entry to Indian tourists for some time window. This post covers the Malaysia part and Thailand part will be covered in the next post. If you want to travel to any of these countries in the visa-free time period, I have written all the questions asked during immigration and at airports during this trip here which might be of help. I mostly stayed in Kuala Lumpur and went to places around it. Although before the trip, I planned to visit Ipoh and Cameron Highlands too, but could not cover it during the trip. I found planning a trip to Malaysia a little difficult. The country is divided into two main islands - Peninsular Malaysia and Borneo. Then there are more islands - Langkawi, Penang island, Perhentian and Redang Islands. Reaching those islands seemed a little difficult to plan and I wish to visit more places in my next Malaysia trip. My first day hostel was booked in Chinatown part of Kuala Lumpur, near Pasar Seni LRT station. As soon as I checked-in and entered my room, I met another Indian named Fletcher, and after that we accompanied each other in the trip. That day, we went to Muzium Negara and Little India. I realized that if you know the right places to buy what you want, Malaysia could be quite cheap. Malaysian currency is Malaysian Ringgit (MYR). 1 MYR is equal to 18 INR. For 2 MYR, you can get a good masala tea in Little India and it costs like 4-5 MYR for a masala dosa. The vegetarian food has good availability in Kuala Lumpur, thanks to the Tamil community. I also tried Mee Goreng, which was vegetarian, and I found it fine in terms of taste. When I checked about Mee Goreng on Wikipedia, I found out that it is unique to Indian immigrants in Malaysia (and neighboring countries) but you don t get it in India!

For the next day, Fletcher had planned a trip to Genting Highlands and pre booked everything. I also planned to join him but when we went to KL Sentral to take the bus, his bus tickets were sold out. I could take a bus at a different time, but decided to visit some other place for the day and cover Genting Highlands later. At the ticket counter, I met a family from Delhi and they wanted to go to Genting Highlands but due to not getting bus tickets for that day, they decided to buy a ticket for the next day and instead planned for Batu Caves that day. I joined them and went to Batu Caves. After returning from Batu Caves, we went our separate ways. I went back and took rest at my hostel and later went to Petronas Towers at night. Petronas Towers is the icon of Kuala Lumpur. Having a photo there was a must. I was at Petronas Towers at around 9 PM. Around that time, Fletcher came back from Genting Highlands and we planned to meet at KL Sentral to head for dinner.

We went back to the same place as the day before where I had Mee Goreng. This time we had dosa and a masala tea. Their masala tea from the last day was tasty and that s why I was looking for them in the first place. We also met a Malaysian family having Indian ancestry dining there and had a nice conversation. Then we went to a place to eat roti canai in Pasar Seni market. Roti canai is a popular non-vegetarian dish in Malaysia but I took the vegetarian version.

The next day, we went to Berjaya Time Square shopping place which sells pretty cheap items for daily use and souveniers too. However, I bought souveniers from Petaling Street, which is in Chinatown. At night, we explored Bukit Bintang, which is the heart of Kuala Lumpur and is famous for its nightlife. After that, Fletcher went to Bangkok and I was in Malaysia for two more days. Next day, I went to Genting Highlands and took the cable car, which had awesome views. I came back to Kuala Lumpur by the night. The remaining day I just roamed around in Bukit Bintang. Then I took a flight for Bangkok on 7th Feb, which I will cover in the next post. In Malaysia, I met so many people from different countries - apart from people from Indian subcontinent, I met Syrians, Indonesians (Malaysia seems to be a popular destination for Indonesian tourists) and Burmese people. Meeting people from other cultures is an integral part of travel for me. My expenses for Food + Accommodation + Travel added to 10,000 INR for a week in Malaysia, while flight costs were: 13,000 INR (Delhi to Kuala Lumpur) + 10,000 INR (Kuala Lumpur to Bangkok) + 12,000 INR (Bangkok to Delhi). For OpenStreetMap users, good news is Kuala Lumpur is fairly well-mapped on OpenStreetMap.

Tips

I bought local SIM from a shop at KL Sentral station complex which had news in their name (I forgot the exact name and there are two shops having news in their name) and it was the cheapest option I could find. The SIM was 10 MYR for 5 GB data for a week. If you want to make calls too, then you need to spend extra 5 MYR.

7-Eleven and KK Mart convenience stores are everywhere in the city and they are open all the time (24 hours a day). If you are a vegetarian, you can at least get some bread and cheese from there to eat.

A lot of people know English (and many - Indians, Pakistanis, Nepalis - know Hindi) in Kuala Lumpur, so I had no language problems most of the time.

For shopping on budget, you can go to Petaling Street, Berjaya Time Square or Bukit Bintang. In particular, there is a shop named I Love KL Gifts in Bukit Bintang which had very good prices. just near the metro/monorail stattion. Check out location of the shop on OpenStreetMap.

28 February 2024

Dirk Eddelbuettel: RcppEigen 0.3.4.0.0 on CRAN: New Upstream, At Last

We are thrilled to share that RcppEigen has now upgraded to Eigen release 3.4.0! The new release 0.3.4.0.0 arrived on CRAN earlier today, and has been shipped to Debian as well. Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms. This update has been in the works for a full two and a half years! It all started with a PR #102 by Yixuan bringing the package-local changes for R integration forward to usptream release 3.4.0. We opened issue #103 to steer possible changes from reverse-dependency checking through. Lo and behold, this just stalled because a few substantial changes were needed and not coming. But after a long wait, and like a bolt out of a perfectly blue sky, Andrew revived it in January with a reverse depends run of his own along with a set of PRs. That was the push that was needed, and I steered it along with a number of reverse dependency checks, and occassional emails to maintainers. We managed to bring it down to only three packages having a hickup, and all three had received PRs thanks to Andrew and even merged them. So the plan became to release today following a final fourteen day window. And CRAN was convinced by our arguments that we followed due process. So there it is! Big big thanks to all who helped it along, especially Yixuan and Andrew but also Mikael who updated another patch set he had prepared for the previous release series. The complete NEWS file entry follows.

Changes in RcppEigen version 0.3.4.0.0 (2024-02-28)

The Eigen version has been upgrade to release 3.4.0 (Yixuan)

Extensive reverse-dependency checks ensure only three out of over 400 packages at CRAN are affected; PRs and patches helped other packages

The long-running branch also contains substantial contributions from Mikael Jagan (for the lme4 interface) and Andrew Johnson (revdep PRs)

Courtesy of CRANberries, there is also a diffstat report for the most recent release. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

25 February 2024

Russ Allbery: Review: The Fund

Review: The Fund, by Rob Copeland

Publisher:	St. Martin's Press
Copyright:	2023
ISBN:	1-250-27694-2
Format:	Kindle
Pages:	310

I first became aware of Ray Dalio when either he or his publisher plastered advertisements for The Principles all over the San Francisco 4th and King Caltrain station. If I recall correctly, there were also constant radio commercials; it was a whole thing in 2017. My brain is very good at tuning out advertisements, so my only thought at the time was "some business guy wrote a self-help book." I think I vaguely assumed he was a CEO of some traditional business, since that's usually who writes heavily marketed books like this. I did not connect him with hedge funds or Bridgewater, which I have a bad habit of confusing with Blackwater. The Principles turns out to be more of a laundered cult manual than a self-help book. And therein lies a story. Rob Copeland is currently with The New York Times, but for many years he was the hedge fund reporter for The Wall Street Journal. He covered, among other things, Bridgewater Associates, the enormous hedge fund founded by Ray Dalio. The Fund is a biography of Ray Dalio and a history of Bridgewater from its founding as a vehicle for Dalio's advising business until 2022 when Dalio, after multiple false starts and title shuffles, finally retired from running the company. (Maybe. Based on the history recounted here, it wouldn't surprise me if he was back at the helm by the time you read this.) It is one of the wildest, creepiest, and most abusive business histories that I have ever read. It's probably worth mentioning, as Copeland does explicitly, that Ray Dalio and Bridgewater hate this book and claim it's a pack of lies. Copeland includes some of their denials (and many non-denials that sound as good as confirmations to me) in footnotes that I found increasingly amusing.

A lawyer for Dalio said he "treated all employees equally, giving people at all levels the same respect and extending them the same perks."

Uh-huh. Anyway, I personally know nothing about Bridgewater other than what I learned here and the occasional mention in Matt Levine's newsletter (which is where I got the recommendation for this book). I have no independent information whether anything Copeland describes here is true, but Copeland provides the typical extensive list of notes and sourcing one expects in a book like this, and Levine's comments indicated it's generally consistent with Bridgewater's industry reputation. I think this book is true, but since the clear implication is that the world's largest hedge fund was primarily a deranged cult whose employees mostly spied on and rated each other rather than doing any real investment work, I also have questions, not all of which Copeland answers to my satisfaction. But more on that later. The center of this book are the Principles. These were an ever-changing list of rules and maxims for how people should conduct themselves within Bridgewater. Per Copeland, although Dalio later published a book by that name, the version of the Principles that made it into the book was sanitized and significantly edited down from the version used inside the company. Dalio was constantly adding new ones and sometimes changing them, but the common theme was radical, confrontational "honesty": never being silent about problems, confronting people directly about anything that they did wrong, and telling people all of their faults so that they could "know themselves better." If this sounds like textbook abusive behavior, you have the right idea. This part Dalio admits to openly, describing Bridgewater as a firm that isn't for everyone but that achieves great results because of this culture. But the uncomfortably confrontational vibes are only the tip of the iceberg of dysfunction. Here are just a few of the ways this played out according to Copeland:

Dalio decided that everyone's opinions should be weighted by the accuracy of their previous decisions, to create a "meritocracy," and therefore hired people to build a social credit system in which people could use an app to constantly rate all of their co-workers. This almost immediately devolved into out-group bullying worthy of a high school, with employees hurriedly down-rating and ostracizing any co-worker that Dalio down-rated.
When an early version of the system uncovered two employees at Bridgewater with more credibility than Dalio, Dalio had the system rigged to ensure that he always had the highest ratings and was not affected by other people's ratings.
Dalio became so obsessed with the principle of confronting problems that he created a centralized log of problems at Bridgewater and required employees find and report a quota of ten or twenty new issues every week or have their bonus docked. He would then regularly pick some issue out of the issue log, no matter how petty, and treat it like a referendum on the worth of the person responsible for the issue.
Dalio's favorite way of dealing with a problem was to put someone on trial. This involved extensive investigations followed by a meeting where Dalio would berate the person and harshly catalog their flaws, often reducing them to tears or panic attacks, while smugly insisting that having an emotional reaction to criticism was a personality flaw. These meetings were then filmed and added to a library available to all Bridgewater employees, often edited to remove Dalio's personal abuse and to make the emotional reaction of the target look disproportionate. The ones Dalio liked the best were shown to all new employees as part of their training in the Principles.
One of the best ways to gain institutional power in Bridgewater was to become sycophantically obsessed with the Principles and to be an eager participant in Dalio's trials. The highest levels of Bridgewater featured constant jockeying for power, often by trying to catch rivals in violations of the Principles so that they would be put on trial.

In one of the common and all-too-disturbing connections between Wall Street finance and the United States' dysfunctional government, James Comey (yes, that James Comey) ran internal security for Bridgewater for three years, meaning that he was the one who pulled evidence from surveillance cameras for Dalio to use to confront employees during his trials. In case the cult vibes weren't strong enough already, Bridgewater developed its own idiosyncratic language worthy of Scientology. The trials were called "probings," firing someone was called "sorting" them, and rating them was called "dotting," among many other Bridgewater-specific terms. Needless to say, no one ever probed Dalio himself. You will also be completely unsurprised to learn that Copeland documents instances of sexual harassment and discrimination at Bridgewater, including some by Dalio himself, although that seems to be a relatively small part of the overall dysfunction. Dalio was happy to publicly humiliate anyone regardless of gender. If you're like me, at this point you're probably wondering how Bridgewater continued operating for so long in this environment. (Per Copeland, since Dalio's retirement in 2022, Bridgewater has drastically reduced the cult-like behaviors, deleted its archive of probings, and de-emphasized the Principles.) It was not actually a religious cult; it was a hedge fund that has to provide investment services to huge, sophisticated clients, and by all accounts it's a very successful one. Why did this bizarre nightmare of a workplace not interfere with Bridgewater's business? This, I think, is the weakest part of this book. Copeland makes a few gestures at answering this question, but none of them are very satisfying. First, it's clear from Copeland's account that almost none of the employees of Bridgewater had any control over Bridgewater's investments. Nearly everyone was working on other parts of the business (sales, investor relations) or on cult-related obsessions. Investment decisions (largely incorporated into algorithms) were made by a tiny core of people and often by Dalio himself. Bridgewater also appears to not trade frequently, unlike some other hedge funds, meaning that they probably stay clear of the more labor-intensive high-frequency parts of the business. Second, Bridgewater took off as a hedge fund just before the hedge fund boom in the 1990s. It transformed from Dalio's personal consulting business and investment newsletter to a hedge fund in 1990 (with an earlier investment from the World Bank in 1987), and the 1990s were a very good decade for hedge funds. Bridgewater, in part due to Dalio's connections and effective marketing via his newsletter, became one of the largest hedge funds in the world, which gave it a sort of institutional momentum. No one was questioned for putting money into Bridgewater even in years when it did poorly compared to its rivals. Third, Dalio used the tried and true method of getting free publicity from the financial press: constantly predict an upcoming downturn, and aggressively take credit whenever you were right. From nearly the start of his career, Dalio predicted economic downturns year after year. Bridgewater did very well in the 2000 to 2003 downturn, and again during the 2008 financial crisis. Dalio aggressively takes credit for predicting both of those downturns and positioning Bridgewater correctly going into them. This is correct; what he avoids mentioning is that he also predicted downturns in every other year, the majority of which never happened. These points together create a bit of an answer, but they don't feel like the whole picture and Copeland doesn't connect the pieces. It seems possible that Dalio may simply be good at investing; he reads obsessively and clearly enjoys thinking about markets, and being an abusive cult leader doesn't take up all of his time. It's also true that to some extent hedge funds are semi-free money machines, in that once you have a sufficient quantity of money and political connections you gain access to investment opportunities and mechanisms that are very likely to make money and that the typical investor simply cannot access. Dalio is clearly good at making personal connections, and invested a lot of effort into forming close ties with tricky clients such as pools of Chinese money. Perhaps the most compelling explanation isn't mentioned directly in this book but instead comes from Matt Levine. Bridgewater touts its algorithmic trading over humans making individual trades, and there is some reason to believe that consistently applying an algorithm without regard to human emotion is a solid trading strategy in at least some investment areas. Levine has asked in his newsletter, tongue firmly in cheek, whether the bizarre cult-like behavior and constant infighting is a strategy to distract all the humans and keep them from messing with the algorithm and thus making bad decisions. Copeland leaves this question unsettled. Instead, one comes away from this book with a clear vision of the most dysfunctional workplace I have ever heard of, and an endless litany of bizarre events each more astonishing than the last. If you like watching train wrecks, this is the book for you. The only drawback is that, unlike other entries in this genre such as Bad Blood or Billion Dollar Loser, Bridgewater is a wildly successful company, so you don't get the schadenfreude of seeing a house of cards collapse. You do, however, get a helpful mental model to apply to the next person who tries to talk to you about "radical honesty" and "idea meritocracy." The flaw in this book is that the existence of an organization like Bridgewater is pointing to systematic flaws in how our society works, which Copeland is largely uninterested in interrogating. "How could this have happened?" is a rather large question to leave unanswered. The sheer outrageousness of Dalio's behavior also gets a bit tiring by the end of the book, when you've seen the patterns and are hearing about the fourth variation. But this is still an astonishing book, and a worthy entry in the genre of capitalism disasters. Rating: 7 out of 10

13 February 2024

Matthew Palmer: Not all TLDs are Created Equal

In light of the recent cancellation of the queer.af domain registration by the Taliban, the fragile and difficult nature of country-code top-level domains (ccTLDs) has once again been comprehensively demonstrated. Since many people may not be aware of the risks, I thought I d give a solid explainer of the whole situation, and explain why you should, in general, not have anything to do with domains which are registered under ccTLDs.

Top-level What-Now? A top-level domain (TLD) is the last part of a domain name (the collection of words, separated by periods, after the `https://` in your web browser s location bar). It s the com in `example.com`, or the af in `queer.af`. There are two kinds of TLDs: country-code TLDs (ccTLDs) and generic TLDs (gTLDs). Despite all being TLDs, they re very different beasts under the hood.

What s the Difference? Generic TLDs are what most organisations and individuals register their domains under: old-school technobabble like com , net , or org , historical oddities like gov , and the new-fangled world of words like tech , social , and bank . These gTLDs are all regulated under a set of rules created and administered by ICANN (the Internet Corporation for Assigned Names and Numbers ), which try to ensure that things aren t a complete wild-west, limiting things like price hikes (well, sometimes, anyway), and providing means for disputes over names¹. Country-code TLDs, in contrast, are all two letters long², and are given out to countries to do with as they please. While ICANN kinda-sorta has something to do with ccTLDs (in the sense that it makes them exist on the Internet), it has no authority to control how a ccTLD is managed. If a country decides to raise prices by 100x, or cancel all registrations that were made on the 12th of the month, there s nothing anyone can do about it. If that sounds bad, that s because it is. Also, it s not a theoretical problem the Taliban deciding to asssert its bigotry over the little corner of the Internet namespace it has taken control of is far from the first time that ccTLDs have caused grief.

Shifting Sands The `queer.af` cancellation is interesting because, at the time the domain was reportedly registered, 2018, Afghanistan had what one might describe as, at least, a different political climate. Since then, of course, things have changed, and the new bosses have decided to get a bit more active. Those running `queer.af` seem to have seen the writing on the wall, and were planning on moving to another, less fraught, domain, but hadn t completed that move when the Taliban came knocking.

The Curious Case of Brexit When the United Kingdom decided to leave the European Union, it fell foul of the EU s rules for the registration of domains under the eu ccTLD³. To register (and maintain) a domain name ending in `.eu`, you have to be a resident of the EU. When the UK ceased to be part of the EU, residents of the UK were no longer EU residents. Cue much unhappiness, wailing, and gnashing of teeth when this was pointed out to Britons. Some decided to give up their domains, and move to other parts of the Internet, while others managed to hold onto them by various legal sleight-of-hand (like having an EU company maintain the registration on their behalf). In any event, all very unpleasant for everyone involved.

Geopolitics on the Internet?!? After Russia invaded Ukraine in February 2022, the Ukranian Vice Prime Minister asked ICANN to suspend ccTLDs associated with Russia. While ICANN said that it wasn t going to do that, because it wouldn t do anything useful, some domain registrars (the companies you pay to register domain names) ceased to deal in Russian ccTLDs, and some websites restricted links to domains with Russian ccTLDs. Whether or not you agree with the sort of activism implied by these actions, the fact remains that even the actions of a government that aren t directly related to the Internet can have grave consequences for your domain name if it s registered under a ccTLD. I don t think any gTLD operator will be invading a neighbouring country any time soon.

Money, Money, Money, Must Be Funny When you register a domain name, you pay a registration fee to a registrar, who does administrative gubbins and causes you to be able to control the domain name in the DNS. However, you don t own that domain name⁴ you re only renting it. When the registration period comes to an end, you have to renew the domain name, or you ll cease to be able to control it. Given that a domain name is typically your brand or identity online, the chances are you d prefer to keep it over time, because moving to a new domain name is a massive pain, having to tell all your customers or users that now you re somewhere else, plus having to accept the risk of someone registering the domain name you used to have and capturing your traffic it s all a gigantic hassle. For gTLDs, ICANN has various rules around price increases and bait-and-switch pricing that tries to keep a lid on the worst excesses of registries. While there are any number of reasonable criticisms of the rules, and the Internet community has to stay on their toes to keep ICANN from totally succumbing to regulatory capture, at least in the gTLD space there s some degree of control over price gouging. On the other hand, ccTLDs have no effective controls over their pricing. For example, in 2008 the Seychelles increased the price of `.sc` domain names from US$25 to US$75. No reason, no warning, just pay up .

Who Is Even Getting That Money? A closely related concern about ccTLDs is that some of the cool ones are assigned to countries that are not great. The poster child for this is almost certainly Libya, which has the ccTLD ly . While Libya was being run by a terrorist-supporting extremist, companies thought it was a great idea to have domain names that ended in `.ly`. These domain registrations weren t (and aren t) cheap, and it s hard to imagine that at least some of that money wasn t going to benefit the Gaddafi regime. Similarly, the British Indian Ocean Territory, which has the io ccTLD, was created in a colonialist piece of chicanery that expelled thousands of native Chagossians from Diego Garcia. Money from the registration of `.io` domains doesn t go to the (former) residents of the Chagos islands, instead it gets paid to the UK government. Again, I m not trying to suggest that all gTLD operators are wonderful people, but it s not particularly likely that the direct beneficiaries of the operation of a gTLD stole an island chain and evicted the residents.

Are ccTLDs Ever Useful? The answer to that question is an unqualified maybe . I certainly don t think it s a good idea to register a domain under a ccTLD for vanity purposes: because it makes a word, is the same as a file extension you like, or because it looks cool. Those ccTLDs that clearly represent and are associated with a particular country are more likely to be OK, because there is less impetus for the registry to try a naked cash grab. Unfortunately, ccTLD registries have a disconcerting habit of changing their minds on whether they serve their geographic locality, such as when auDA decided to declare an open season in the `.au` namespace some years ago. Essentially, while a ccTLD may have geographic connotations now, there s not a lot of guarantee that they won t fall victim to scope creep in the future. Finally, it might be somewhat safer to register under a ccTLD if you live in the location involved. At least then you might have a better idea of whether your domain is likely to get pulled out from underneath you. Unfortunately, as the `.eu` example shows, living somewhere today is no guarantee you ll still be living there tomorrow, even if you don t move house. In short, I d suggest sticking to gTLDs. They re at least lower risk than ccTLDs.

+1, Helpful If you ve found this post informative, why not buy me a refreshing beverage? My typing fingers (both of them) thank you in advance for your generosity.

Footnotes

don t make the mistake of thinking that I approve of ICANN or how it operates; it s an omnishambles of poor governance and incomprehensible decision-making.

corresponding roughly, though not precisely (because everything has to be complicated, because humans are complicated), to the entries in the ISO standard for Codes for the representation of names of countries and their subdivisions , ISO 3166.

yes, the EU is not a country; it s part of the roughly, though not precisely caveat mentioned previously.

despite what domain registrars try very hard to imply, without falling foul of deceptive advertising regulations.

8 February 2024

Dirk Eddelbuettel: RcppArmadillo 0.12.8.0.0 on CRAN: New Upstream, Interface Polish

Armadillo is a powerful and expressive C++ template library for linear algebra and scientific computing. It aims towards a good balance between speed and ease of use, has a syntax deliberately close to Matlab, and is useful for algorithm development directly in C++, or quick conversion of research code into production environments. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 1119 other packages on CRAN, downloaded 32.5 million times (per the partial logs from the cloud mirrors of CRAN), and the CSDA paper (preprint / vignette) by Conrad and myself has been cited 575 times according to Google Scholar. This release brings a new (stable) upstream (minor) release Armadillo 12.8.0 prepared by Conrad two days ago. We, as usual, prepared a release candidate which we tested against the over 1100 CRAN packages using RcppArmadillo. This found no issues, which was confirmed by CRAN once we uploaded and so it arrived as a new release today in a fully automated fashion. We also made a small change that had been prepared by GitHub issue #400: a few internal header files that were cluttering the top-level of the include directory have been moved to internal directories. The standard header is of course unaffected, and the set of full / light / lighter / lightest headers (matching we did a while back in Rcpp) also continue to work as one expects. This change was also tested in a full reverse-dependency check in January but had not been released to CRAN yet. The set of changes since the last CRAN release follows.

Changes in RcppArmadillo version 0.12.8.0.0 (2024-02-06)

Upgraded to Armadillo release 12.8.0 (Cortisol Injector)

Faster detection of symmetric expressions by pinv() and rank()

Expanded shift() to handle sparse matrices

Expanded conv_to for more flexible conversions between sparse and dense matrices

Added cbrt()

More compact representation of integers when saving matrices in CSV format

Five non-user facing top-level include files have been removed (#432 closing #400 and building on #395 and #396)

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the Rcpp R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

7 February 2024

Reproducible Builds: Reproducible Builds in January 2024

Welcome to the January 2024 report from the Reproducible Builds project. In these reports we outline the most important things that we have been up to over the past month. If you are interested in contributing to the project, please visit our Contribute page on our website.

How we executed a critical supply chain attack on PyTorch John Stawinski and Adnan Khan published a lengthy blog post detailing how they executed a supply-chain attack against PyTorch, a popular machine learning platform used by titans like Google, Meta, Boeing, and Lockheed Martin :
Our exploit path resulted in the ability to upload malicious PyTorch releases to GitHub, upload releases to [Amazon Web Services], potentially add code to the main repository branch, backdoor PyTorch dependencies the list goes on. In short, it was bad. Quite bad.
The attack pivoted on PyTorch s use of self-hosted runners as well as submitting a pull request to address a trivial typo in the project s `README` file to gain access to repository secrets and API keys that could subsequently be used for malicious purposes.

New Arch Linux forensic filesystem tool On our mailing list this month, long-time Reproducible Builds developer kpcyrd announced a new tool designed to forensically analyse Arch Linux filesystem images. Called `archlinux-userland-fs-cmp`, the tool is supposed to be used from a rescue image (any Linux) with an Arch install mounted to, [for example], `/mnt`. Crucially, however, at no point is any file from the mounted filesystem eval d or otherwise executed. Parsers are written in a memory safe language. More information about the tool can be found on their announcement message, as well as on the tool s homepage. A GIF of the tool in action is also available.

Issues with our `SOURCE_DATE_EPOCH` code? Chris Lamb started a thread on our mailing list summarising some potential problems with the source code snippet the Reproducible Builds project has been using to parse the `SOURCE_DATE_EPOCH` environment variable:
I m not 100% sure who originally wrote this code, but it was probably sometime in the ~2015 era, and it must be in a huge number of codebases by now. Anyway, Alejandro Colomar was working on the shadow security tool and pinged me regarding some potential issues with the code. You can see this conversation here.
Chris ends his message with a request that those with intimate or low-level knowledge of `time_t`, C types, overflows and the various parsing libraries in the C standard library (etc.) contribute with further info.

Distribution updates In Debian this month, Roland Clobus posted another detailed update of the status of reproducible ISO images on our mailing list. In particular, Roland helpfully summarised that all major desktops build reproducibly with bullseye, bookworm, trixie and sid provided they are built for a second time within the same DAK run (i.e. [within] 6 hours) . Additionally 7 of the 8 bookworm images from the official download link build reproducibly at any later time. In addition to this, three reviews of Debian packages were added, 17 were updated and 15 were removed this month adding to our knowledge about identified issues. Elsewhere, Bernhard posted another monthly update for his work elsewhere in openSUSE.

Community updates There were made a number of improvements to our website, including Bernhard M. Wiedemann fixing a number of typos of the term nondeterministic . [ ] and Jan Zerebecki adding a substantial and highly welcome section to our page about `SOURCE_DATE_EPOCH` to document its interaction with distribution rebuilds. [ ].
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions `254` and `255` to Debian but focusing on triaging and/or merging code from other contributors. This included adding support for comparing eXtensible ARchive (.XAR/.PKG) files courtesy of Seth Michael Larson [ ][ ], as well considerable work from Vekhir in order to fix compatibility between various and subtle incompatible versions of the progressbar libraries in Python [ ][ ][ ][ ]. Thanks!

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In January, a number of changes were made by Holger Levsen:

Debian-related changes:

Reduce the number of `arm64` architecture workers from 24 to 16. [ ]

Use diffoscope from the Debian release being tested again. [ ]

Improve the handling when killing unwanted processes [ ][ ][ ] and be more verbose about it, too [ ].

Don t mark a job as failed if process marked as to-be-killed is already gone. [ ]

Display the architecture of builds that have been running for more than 48 hours. [ ]

Reboot `arm64` nodes when they hit an OOM (out of memory) state. [ ]

Package rescheduling changes:

Reduce IRC notifications to 1 when rescheduling due to package status changes. [ ]

Correctly set `SUDO_USER` when rescheduling packages. [ ]

Automatically reschedule packages regressing to FTBFS (build failure) or FTBR (build success, but unreproducible). [ ]

OpenWrt-related changes:

Install the `python3-dev` and `python3-pyelftools` packages as they are now needed for the `sunxi` target. [ ][ ]

Also install the `libpam0g-dev` which is needed by some OpenWrt hardware targets. [ ]

Misc:

As it s January, set the `real_year` variable to 2024 [ ] and bump various copyright years as well [ ].

Fix a large (!) number of spelling mistakes in various scripts. [ ][ ][ ]

Prevent Squid and Systemd processes from being killed by the kernel s OOM killer. [ ]

Install the `iptables` tool everywhere, else our custom `rc.local` script fails. [ ]

Cleanup the `/srv/workspace/pbuilder` directory on boot. [ ]

Automatically restart Squid if it fails. [ ]

Limit the execution of `chroot-installation` jobs to a maximum of 4 concurrent runs. [ ][ ]

Significant amounts of node maintenance was performed by Holger Levsen (eg. [ ][ ][ ][ ][ ][ ][ ] etc.) and Vagrant Cascadian (eg. [ ][ ][ ][ ][ ][ ][ ][ ]). Indeed, Vagrant Cascadian handled an extended power outage for the network running the Debian `armhf` architecture test infrastructure. This provided the incentive to replace the UPS batteries and consolidate infrastructure to reduce future UPS load. [ ] Elsewhere in our infrastructure, however, Holger Levsen also adjusted the email configuration for `@reproducible-builds.org` to deal with a new SMTP email attack. [ ]

Upstream patches The Reproducible Builds project tries to detects, dissects and fix as many (currently) unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Bernhard M. Wiedemann:

`cython` (nondeterminstic path issue)

`deluge` (issue with modification time of `.egg` file)

`gap-ferret`, `gap-semigroups` & `gap-simpcomp` (nondeterministic `config.log` file)

`grpc` (filesystem ordering issue )

`hub` (random)

`kubernetes1.22` & `kubernetes1.23` (sort-related issue)

`kubernetes1.24` & `kubernetes1.25` (`go -trimpath` vs random issue)

`libjcat` (drop test files with random bytes)

`luajit` (Use new `d` option for deterministic bytecode output)

`meson` [ ][ ] (sort the results from Python filesystem call)

`python-rjsmin` (drop GCC instrumentation artifacts)

`qt6-virtualkeyboard+others` (bug parallelism/race)

`SoapySDR` (parallelism-related issue)

`systemd` (sorting problem)

`warewulf` (CPIO modification time issue, etc.)

Chris Lamb:

#1060254 filed against `mumble`.

James Addison:

`guake` ( Schroedinger file due to race condition)

`qhelpgenerator-qt5` (timezone localization; fix also merged upstream for QT6)

`sphinx` (search index `doctitle` sorting)

Separate to this, Vagrant Cascadian followed up with the relevant maintainers when reproducibility fixes were not included in newly-uploaded versions of the `mm-common` package in Debian this was quickly fixed, however. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

IRC: `#reproducible-builds` on `irc.oftc.net`.

Mailing list: `rb-general@lists.reproducible-builds.org`

Mastodon: @reproducible_builds

Twitter: @ReproBuilds

2 February 2024

Ian Jackson: UPS, the Useless Parcel Service; VAT and fees

I recently had the most astonishingly bad experience with UPS, the courier company. They severely damaged my parcels, and were very bad about UK import VAT, ultimately ending up harassing me on autopilot. The only thing that got their attention was my draft Particulars of Claim for intended legal action. Surprisingly, I got them to admit in writing that the disbursement fee they charge recipients alongside the actual VAT, is just something they made up with no legal basis. What happened Autumn last year I ordered some furniture from a company in Germany. This was to be shipped by them to me by courier. The supplier chose UPS. UPS misrouted one of the three parcels to Denmark. When everything arrived, it had been sat on by elephants. The supplier had to replace most of it, with considerable inconvenience and delay to me, and of course a loss to the supplier. But this post isn t mostly about that. This post is about VAT. You see, import VAT was due, because of fucking Brexit. UPS made a complete hash of collecting that VAT. Their computers can t issue coherent documents, their email helpdesk is completely useless, and their automated debt collection systems run along uninfluenced by any external input. The crazy, including legal threats and escalating late payment fees, continued even after I paid the VAT discrepancy (which I did despite them not yet having provided any coherent calculation for it). This kind of behaviour is a very small and mild version of the kind of things British Gas did to Lisa Ferguson, who eventually won substantial damages for harassment, plus 10K of costs. Having tried asking nicely, and sending stiff letters, I too threatened litigation. I would have actually started a court claim, but it would have included a claim under the Protection from Harassment Act. Those have to be filed under the Part 8 procedure , which involves sending all of the written evidence you re going to use along with the claim form. Collating all that would be a good deal of work, especially since UPS and ControlAccount didn t engage with me at all, so I had no idea which things they might actually dispute. So I decided that before issuing proceedings, I d send them a copy of my draft Particulars of Claim, along with an offer to settle if they would pay me a modest sum and stop being evil robots at me. Rather than me typing the whole tale in again, you can read the full gory details in the PDF of my draft Particulars of Claim. (I ve redacted the reference numbers). Outcome The draft Particulars finally got their attention. UPS sent me an offer: they agreed to pay me 50, in full and final settlement. That was close enough to my offer that I accepted it. I mostly wanted them to stop, and they do seem to have done so. And I ve received the 50. VAT calculation They also finally included an actual explanation of the VAT calculation. It s absurd, but it s not UPS s absurd:

The clearance was entered initially with estimated import charges of 400.03, consisting of 387.83 VAT, and 12.20 disbursement fee. This original entry regrettably did not include the freight cost for calculating the VAT, and as such when submitted for final entry the VAT value was adjusted to include this and an amended invoice was issued for an additional 39.84. HMRC calculate the amount against which VAT is raised using the value of goods, insurance and freight, however they also may apply a VAT adjustment figure. The VAT Adjustment is based on many factors (Incidental costs in regards to a shipment), which includes charge for currency conversion if the invoice does not list values in Sterling, but the main is due to the inland freight from airport of destination to the final delivery point, as this charge varies, for example, from EMA to Edinburgh would be 150, from EMA to Derby would be 1, so each year UPS must supply HMRC with all values incurred for entry build up and they give an average which UPS have to use on the entry build up as the VAT Adjustment. The correct calculation for the import charges is therefore as follows: Goods value divided by exchange rate 2,489.53 EUR / 1.1683 = 2,130.89 GBP Duty: Goods value plus freight (%) 2,130.89 GBP + 5% = 2,237.43 GBP. That total times the duty rate. X 0 % = 0 GBP VAT: Goods value plus freight (100%) 2,130.89 GBP + 0 = 2,130.89 GBP That total plus duty and VAT adjustment 2,130.89 GBP + 0 GBP + 7.49 GBP = 2,348.08 GBP. That total times 20% VAT = 427.67 GBP As detailed above we must confirm that the final VAT charges applied to the shipment were correct, and that no refund of this is therefore due.

This looks very like HMRC-originated nonsense. If only they had put it on the original bills! It s completely ridiculous that it took four months and near-litigation to obtain it. Disbursement fee One more thing. UPS billed me a 12 disbursement fee . When you import something, there s often tax to pay. The courier company pays that to the government, and the consignee pays it to the courier. Usually the courier demands it before final delivery, since otherwise they end up having to chase it as a debt. It is common for parcel companies to add a random fee of their own. As I note in my Particulars, there isn t any legal basis for this. In my own offer of settlement I proposed that UPS should:

State under what principle of English law (such as, what enactment or principle of Common Law), you levy the disbursement fee (or refund it).

To my surprise they actually responded to this in their own settlement letter. (They didn t, for example, mention the harassment at all.) They said (emphasis mine):

A disbursement fee is a fee for amounts paid or processed on behalf of a client. It is an established category of charge used by legal firms, amongst other companies, for billing of various ancillary costs which may be incurred in completion of service. Disbursement fees are not covered by a specific law, nor are they legally prohibited. Regarding UPS disbursement fee this is an administrative charge levied for the use of UPS deferment account to prepay import charges for clearance through CDS. This charge would therefore be billed to the party that is responsible for the import charges, normally the consignee or receiver of the shipment in question. The disbursement fee as applied is legitimate, and as you have stated is a commonly used and recognised charge throughout the courier industry, and I can confirm that this was charged correctly in this instance.

On UPS s analysis, they can just make up whatever fee they like. That is clearly not right (and I don t even need to refer to consumer protection law, which would also make it obviously unlawful). And, that everyone does it doesn t make it lawful. There are so many things that are ubiquitous but unlawful, especially nowadays when much of the legal system - especially consumer protection regulators - has been underfunded to beyond the point of collapse. Next time this comes up I might have a go at getting the fee back. (Obviously I ll have to pay it first, to get my parcel.) ParcelForce and Royal Mail I think this analysis doesn t apply to ParcelForce and (probably) Royal Mail. I looked into this in 2009, and I found that Parcelforce had been given the ability to write their own private laws: Schemes made under section 89 of the Postal Services Act 2000. This is obviously ridiculous but I think it was the law in 2009. I doubt the intervening governments have fixed it. Furniture Oh, yes, the actual furniture. The replacements arrived intact and are great :-).

comments

29 January 2024

Russ Allbery: Review: Bluebird

Review: Bluebird, by Ciel Pierlot

Publisher:	Angry Robot
Copyright:	2022
ISBN:	0-85766-967-2
Format:	Kindle
Pages:	458

Bluebird is a stand-alone far-future science fiction adventure. Ten thousand years ago, a star fell into the galaxy carrying three factions of humanity. The Ascetics, the Ossuary, and the Pyrites each believe that only their god survived and the other two factions are heretics. Between them, they have conquered the rest of the galaxy and its non-human species. The only thing the factions hate worse than each other are those who attempt to stay outside the faction system. Rig used to be a Pyrite weapon designer before she set fire to her office and escaped with her greatest invention. Now she's a Nightbird, a member of an outlaw band that tries to help refugees and protect her fellow Kashrini against Pyrite genocide. On her side, she has her girlfriend, an Ascetic librarian; her ship, Bluebird; and her guns, Panache and Pizzazz. And now, perhaps, the mysterious Ginka, a Zazra empath and remarkably capable fighter who helps Rig escape from an ambush by Pyrite soldiers. Rig wants to stay alive, help her people, and defy the factions. Pyrite wants Rig's secrets and, as leverage, has her sister. What Ginka wants is not entirely clear even to Ginka. This book is absurd, but I still had fun with it. It's dangerous for me to compare things to anime given how little anime that I've watched, but Bluebird had that vibe for me: anime, or maybe Japanese RPGs or superhero comics. The storytelling is very visual, combat-oriented, and not particularly realistic. Rig is a pistol sharpshooter and Ginka is the type of undefined deadly acrobatic fighter so often seen in that type of media. In addition to her ship, Rig has a gorgeous hand-maintained racing hoverbike with a beautiful paint job. It's that sort of book. It's also the sort of book where the characters obey cinematic logic designed to maximize dramatic physical confrontations, even if their actions make no logical sense. There is no facial recognition or screening, and it's bizarrely easy for the protagonists to end up in same physical location as high-up bad guys. One of the weapon systems that's critical to the plot makes no sense whatsoever. At critical moments, the bad guys behave more like final bosses in a video game, picking up weapons to deal with the protagonists directly instead of using their supposedly vast armies of agents. There is supposedly a whole galaxy full of civilizations with capital worlds covered in planet-spanning cities, but politics barely exist and the faction leaders get directly involved in the plot. If you are looking for a realistic projection of technology or society, I cannot stress enough that this is not the book that you're looking for. You probably figured that out when I mentioned ten thousand years of war, but that will only be the beginning of the suspension of disbelief problems. You need to turn off your brain and enjoy the action sequences and melodrama. I'm normally good at that, and I admit I still struggled because the plot logic is such a mismatch with the typical novels I read. There are several points where the characters do something that seems so monumentally dumb that I was sure Pierlot was setting them up for a fall, and then I got wrong-footed because their plan worked fine, or exploded for unrelated reasons. I think this type of story, heavy on dramatic eye-candy and emotional moments with swelling soundtracks, is a lot easier to pull off in visual media where all the pretty pictures distract your brain. In a novel, there's a lot of time to think about the strategy, technology, and government structure, which for this book is not a good idea. If you can get past that, though, Rig is entertainingly snarky and Ginka, who turns out to be the emotional heart of the book, is an enjoyable character with a real growth arc. Her background is a bit simplistic and the villains are the sort of pure evil that you might expect from this type of cinematic plot, but I cared about the outcome of her story. Some parts of the plot dragged and I think the editing could have been tighter, but there was enough competence porn and banter to pull me through. I would recommend Bluebird only cautiously, since you're going to need to turn off large portions of your brain and be in the right mood for nonsensically dramatic confrontations, but I don't regret reading it. It's mostly in primary colors and the emotional conflicts are not what anyone would call subtle, but it delivers a character arc and a somewhat satisfying ending. Content warning: There is a lot of serious physical injury in this book, including surgical maiming. If that's going to bother you, you may want to give this one a pass. Rating: 6 out of 10

28 January 2024

Niels Thykier: Annotating the Debian packaging directory

In my previous blog post Providing online reference documentation for debputy, I made a point about how debhelper documentation was suboptimal on account of being static rather than online. The thing is that debhelper is not alone in this problem space, even if it is a major contributor to the number of packaging files you have to to know about. If we look at the "competition" here such as Fedora and Arch Linux, they tend to only have one packaging file. While most Debian people will tell you a long list of cons about having one packaging file (such a Fedora's spec file being 3+ domain specific languages "mashed" into one file), one major advantage is that there is only "the one packaging file". You only need to remember where to find the documentation for one file, which is great when you are running on wetware with limited storage capacity. Which means as a newbie, you can dedicate less mental resources to tracking multiple files and how they interact and more effort understanding the "one file" at hand. I started by asking myself how can we in Debian make the packaging stack more accessible to newcomers? Spoiler alert, I dug myself into rabbit hole and ended up somewhere else than where I thought I was going. I started by wanting to scan the debian directory and annotate all files that I could with documentation links. The logic was that if debputy could do that for you, then you could spend more mental effort elsewhere. So I combined debputy's packager provided files detection with a static list of files and I quickly had a good starting point for debputy-based packages.

Adding (non-static) dpkg and debhelper files to the mix Now, I could have closed the topic here and said "Look, I did debputy files plus couple of super common files". But I decided to take it a bit further. I added support for handling some dpkg files like packager provided files (such as debian/substvars and debian/symbols). But even then, we all know that debhelper is the big hurdle and a major part of the omission... In another previous blog post (A new Debian package helper: debputy), I made a point about how debputy could list all auxiliary files while debhelper could not. This was exactly the kind of feature that I would need for this feature, if this feature was to cover debhelper. Now, I also remarked in that blog post that I was not willing to maintain such a list. Also, I may have ranted about static documentation being unhelpful for debhelper as it excludes third-party provided tooling. Fortunately, a recent update to dh_assistant had provided some basic plumbing for loading dh sequences. This meant that getting a list of all relevant commands for a source package was a lot easier than it used to be. Once you have a list of commands, it would be possible to check all of them for dh's NOOP PROMISE hints. In these hints, a command can assert it does nothing if a given pkgfile is not present. This lead to the new dh_assistant list-guessed-dh-config-files command that will list all declared pkgfiles and which helpers listed them. With this combined feature set in place, debputy could call dh_assistant to get a list of pkgfiles, pretend they were packager provided files and annotate those along with manpage for the relevant debhelper command. The exciting thing about letting debpputy resolve the pkgfiles is that debputy will resolve "named" files automatically (debhelper tools will only do so when --name is passed), so it is much more likely to detect named pkgfiles correctly too. Side note: I am going to ignore the elephant in the room for now, which is dh_installsystemd and its package@.service files and the wide-spread use of debian/foo.service where there is no package called foo. For the latter case, the "proper" name would be debian/pkg.foo.service. With the new dh_assistant feature done and added to debputy, debputy could now detect the ubiquitous debian/install file. Excellent. But less great was that the very common debian/docs file was not. Turns out that dh_installdocs cannot be skipped by dh, so it cannot have NOOP PROMISE hints. Meh... Well, dh_assistant could learn about a new INTROSPECTABLE marker in addition to the NOOP PROMISE and then I could sprinkle that into a few commands. Indeed that worked and meant that debian/postinst (etc.) are now also detectable. At this point, debputy would be able to identify a wide range of debhelper related configuration files in debian/ and at least associate each of them with one or more commands. Nice, surely, this would be a good place to stop, right...?

Adding more metadata to the files The debhelper detected files only had a command name and manpage URI to that command. It would be nice if we could contextualize this a bit more. Like is this file installed into the package as is like debian/pam or is it a file list to be processed like debian/install. To make this distinction, I could add the most common debhelper file types to my static list and then merge the result together. Except, I do not want to maintain a full list in debputy. Fortunately, debputy has a quite extensible plugin infrastructure, so added a new plugin feature to provide this kind of detail and now I can outsource the problem! I split my definitions into two and placed the generic ones in the debputy-documentation plugin and moved the debhelper related ones to debhelper-documentation. Additionally, third-party dh addons could provide their own debputy plugin to add context to their configuration files. So, this gave birth file categories and configuration features, which described each file on different fronts. As an example, debian/gbp.conf could be tagged as a maint-config to signal that it is not directly related to the package build but more of a tool or style preference file. On the other hand, debian/install and debian/debputy.manifest would both be tagged as a pkg-helper-config. Files like debian/pam were tagged as ppf-file for packager provided file and so on. I mentioned configuration features above and those were added because, I have had a beef with debhelper's "standard" configuration file format as read by filearray and filedoublearray. They are often considered simple to understand, but it is hard to know how a tool will actually read the file. As an example, consider the following:

Will the debhelper use filearray, filedoublearray or none of them to read the file? This topic has about 2 bits of entropy.

Will the config file be executed if it is marked executable assuming you are using the right compat level? If it is executable, does dh-exec allow renaming for this file? This topic adds 1 or 2 bit of entropy depending on the context.

Will the config file be subject to glob expansions? This topic sounds like a boolean but is a complicated mess. The globs can be handled either by debhelper as it parses the file for you. In this case, the globs are applied to every token. However, this is not what dh_install does. Here the last token on each line is supposed to be a directory and therefore not subject to globs. Therefore, dh_install does the globbing itself afterwards but only on part of the tokens. So that is about 2 bits of entropy more. Actually, it gets worse...

If the file is executed, debhelper will refuse to expand globs in the output of the command, which was a deliberate design choice by the original debhelper maintainer took when he introduced the feature in debhelper/8.9.12. Except, dh_install feature interacts with the design choice and does enable glob expansion in the tool output, because it does so manually after its filedoublearray call.

So these "simple" files have way too many combinations of how they can be interpreted. I figured it would be helpful if debputy could highlight these difference, so I added support for those as well. Accordingly, debian/install is tagged with multiple tags including dh-executable-config and dh-glob-after-execute. Then, I added a datatable of these tags, so it would be easy for people to look up what they meant. Ok, this seems like a closed deal, right...?

Context, context, context However, the dh-executable-config tag among other are only applicable in compat 9 or later. It does not seem newbie friendly if you are told that this feature exist, but then have to read in the extended description that that it actually does not apply to your package. This problem seems fixable. Thanks to dh_assistant, it is easy to figure out which compat level the package is using. Then tweak some metadata to enable per compat level rules. With that tags like dh-executable-config only appears for packages using compat 9 or later. Also, debputy should be able to tell you where packager provided files like debian/pam are installed. We already have the logic for packager provided files that debputy supports and I am already using debputy engine for detecting the files. If only the plugin provided metadata gave me the install pattern, debputy would be able tell you where this file goes in the package. Indeed, a bit of tweaking later and setting install-pattern to usr/lib/pam.d/ name , debputy presented me with the correct install-path with the package name placing the name placeholder. Now, I have been using debian/pam as an example, because debian/pam is installed into usr/lib/pam.d in compat 14. But in earlier compat levels, it was installed into etc/pam.d. Well, I already had an infrastructure for doing compat file tags. Off we go to add install-pattern to the complat level infrastructure and now changing the compat level would change the path. Great. (Bug warning: The value is off-by-one in the current version of debhelper. This is fixed in git) Also, while we are in this install-pattern business, a number of debhelper config files causes files to be installed into a fixed directory. Like debian/docs which causes file to be installed into /usr/share/docs/ package . Surely, we can expand that as well and provide that bit of context too... and done. (Bug warning: The code currently does not account for the main documentation package context) It is rather common pattern for people to do debian/foo.in files, because they want to custom generation of debian/foo. Which means if you have debian/foo you get "Oh, let me tell you about debian/foo ". Then you rename it to debian/foo.in and the result is "debian/foo.in is a total mystery to me!". That is suboptimal, so lets detect those as well as if they were the original file but add a tag saying that they are a generate template and which file we suspect it generates. Finally, if you use debputy, almost all of the standard debhelper commands are removed from the sequence, since debputy replaces them. It would be weird if these commands still contributed configuration files when they are not actually going to be invoked. This mostly happened naturally due to the way the underlying dh_assistant command works. However, any file mentioned by the debhelper-documentation plugin would still appear unfortunately. So off I went to filter the list of known configuration files against which dh_ commands that dh_assistant thought would be used for this package.

Wrapping it up I was several layers into this and had to dig myself out. I have ended up with a lot of data and metadata. But it was quite difficult for me to arrange the output in a user friendly manner. However, all this data did seem like it would be useful any tool that wants to understand more about the package. So to get out of the rabbit hole, I for now wrapped all of this into JSON and now we have a debputy tool-support annotate-debian-directory command that might be useful for other tools. To try it out, you can try the following demo: In another day, I will figure out how to structure this output so it is useful for non-machine consumers. Suggestions are welcome. :)

Limitations of the approach As a closing remark, I should probably remind people that this feature relies heavily on declarative features. These include:

When determining which commands are relevant, using Build-Depends: dh-sequence-foo is much more reliable than configuring it via the Turing complete configuration we call debian/rules.

When debhelper commands use NOOP promise hints, dh_assistant can "see" the config files listed those hints, meaning the file will at least be detected. For new introspectable hint and the debputy plugin, it is probably better to wait until the dust settles a bit before adding any of those.

You can help yourself and others to better results by using the declarative way rather than using debian/rules, which is the bane of all introspection!

24 January 2024

Dirk Eddelbuettel: RcppAnnoy 0.0.22 on CRAN: Maintenance

A very minor maintenance release, now at version 0.0.22, of RcppAnnoy has arrived on CRAN. RcppAnnoy is the Rcpp-based R integration of the nifty Annoy library by Erik Bernhardsson. Annoy is a small and lightweight C++ template header library for very fast approximate nearest neighbours originally developed to drive the Spotify music discovery algorithm. It had all the buzzwords already a decade ago: it is one of the algorithms behind (drum roll ) vector search as it finds approximate matches very quickly and also allows to persist the data. This release responds to a CRAN request to clean up empty macros and sections in Rd files. Details of the release follow based on the NEWS file.

Changes in version 0.0.22 (2024-01-23)

Replace empty examples macro to satisfy CRAN request.

Courtesy of my CRANberries, there is also a diffstat report for this release. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

18 January 2024

Russell Coker: LicheePi 4A (RISC-V) First Look

I Just bought a LicheePi 4A RISC-V embedded computer (like a RaspberryPi but with a RISC-V CPU) for $322.68 from Aliexpress (the official site for buying LicheePi devices). Here is the Sipheed web page about it and their other recent offerings [1]. I got the version with 16G of RAM and 128G of storage, I probably don t need that much storage (I can use NFS or USB) but 16G of RAM is good for VMs. Here is the Wiki about this board [2]. Configuration When you get one of these devices you should make setting up ssh server your first priority. I found the HDMI output to be very unreliable. The first monitor I tried was a Samsung 4K monitor dating from when 4K was a new thing, the LicheePi initially refused to operate at a resolution higher than 1024*768 but later on switched to 4K resolution when resuming from screen-blank for no apparent reason (and the window manager didn t support this properly). On the Dell 4K monitor I use on my main workstation it sometimes refused to talk to it and occasionally worked. I got it running at 1920*1080 without problems and then switched it to 4K and it lost video sync and never talked to that monitor again. On my Desklab portabable 4K monitor I got it to display in 4K resolution but only the top left 1/4 of the screen displayed. The issues with HDMI monitor support greatly limit the immediate potential for using this as a workstation. It doesn t make it impossible but would be fiddly at best. It s quite likely that a future OS update will fix this. But at the moment it s best used as a server. The LicheePi has a custom Linux distribution based on Ubuntu so you want too put something like the following in /etc/network/interfaces to make it automatically connect to the ethernet when plugged in:

auto end0
iface end0 inet dhcp

Then to get sshd to start you have to run the following commands to generate ssh host keys that aren t zero bytes long:

rm /etc/ssh/ssh_host_*
systemctl restart ssh.service

It appears to have wifi hardware but the OS doesn t recognise it. This isn t a priority for me as I mostly want to use it as a server. Performance For the first test of performance I created a 100MB file from /dev/urandom and then tried compressing it on various systems. With zstd -9 it took 16.893 user seconds on the LicheePi4A, 0.428s on my Thinkpad X1 Carbon Gen5 with a i5-6300U CPU (Debian/Unstable), 1.288s on my E5-2696 v3 workstation (Debian/Bookworm), 0.467s on the E5-2696 v3 running Debian/Unstable, 2.067s on a E3-1271 v3 server, and 7.179s on the E3-1271 v3 system emulating a RISC-V system via QEMU running Debian/Unstable. It s very impressive that the QEMU emulation is fast enough that emulating a different CPU architecture is only 3.5* slower for this test (or maybe 10* slower if it was running Debian/Unstable on the AMD64 code)! The emulated RISC-V is also more than twice as fast as real RISC-V hardware and probably of comparable speed to real RISC-V hardware when running the same versions (and might be slightly slower if running the same version of zstd) which is a tribute to the quality of emulation. One performance issue that most people don t notice is the time taken to negotiate ssh sessions. It s usually not noticed because the common CPUs have got faster at about the same rate as the algorithms for encryption and authentication have become more complex. On my i5-6300U laptop it takes 0m0.384s to run ssh -i ~/.ssh/id_ed25519 localhost id with the below server settings (taken from advice on ssh-audit.com [3] for a secure ssh configuration). On the E3-1271 v3 server it is 0.336s, on the QMU system it is 28.022s, and on the LicheePi it is 0.592s. By this metric the LicheePi is about 80% slower than decent x86 systems and the QEMU emulation of RISC-V is 73* slower than the x86 system it runs on. Does crypto depend on instructions that are difficult to emulate?

HostKey /etc/ssh/ssh_host_ed25519_key
KexAlgorithms -ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group14-sha256
MACs -umac-64-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1

I haven t yet tested the performance of Ethernet (what routing speed can you get through the 2 gigabit ports?), emmc storage, and USB. At the moment I ve been focused on using RISC-V as a test and development platform. My conclusion is that I m glad I don t plan to compile many kernels or anything large like LibreOffice. But that for typical development that I do it will be quite adequate. The speed of Chromium seems adequate in basic tests, but the video output hasn t worked reliably enough to do advanced tests. Hardware Features Having two Gigabit Ethernet ports, 4 USB-3 ports, and Wifi on board gives some great options for using this as a router. It s disappointing that they didn t go with 2.5Gbit as everyone seems to be doing that nowadays but Gigabit is enough for most things. Having only a single HDMI port and not supporting USB-C docks (the USB-C port appears to be power only) limits what can be done for workstation use and for controlling displays. I know of people using small ARM computers attached to the back of large TVs for advertising purposes and that isn t going to be a great option for this. The CPU and RAM apparently uses a lot of power (which is relative the entire system draws up to 2A at 5V so the CPU would be something below 5W). To get this working a cooling fan has to be stuck to the CPU and RAM chips via a layer of thermal stuff that resembles a fine sheet of blu-tack in both color and stickyness. I am disappointed that there isn t any more solid form of construction, to mount this on a wall or ceiling some extra hardware would be needed to secure this. Also if they just had a really big copper heatsink I think that would be better. 80386 CPUs with similar TDP were able to run without a fan. I wonder how things would work with all USB ports in use. It s expected that a USB port can supply a minimum of 2.5W which means that all the ports could require 10W if they were active. Presumably something significantly less than 5W is available for the USB ports. Other Devices Sipheed has a range of other devices in the works. They currently sell the LicheeCluster4A which support 7 compute modules for a cluster in a box. This has some interesting potential for testing and demonstrating cluster software but you could probably buy an AMD64 system with more compute power for less money. The Lichee Console 4A is a tiny laptop which could be useful for people who like the 7 laptop form factor, unfortunately it only has a 1280*800 display if it had the same resolution display as a typical 7 phone I would have bought one. The next device that appeals to me is the soon to be released Lichee Pad 4A which is a 10.1 tablet with 1920*1200 display, Wifi6, Bluetooth 5.4, and 16G of RAM. It also has 1 USB-C connection, 2*USB-3 sockets, and support for an external card with 2*Gigabit ethernet. It s a tablet as a laptop without keyboard instead of the more common larger phone design model. They are also about to release the LicheePadMax4A which is similar to the other tablet but with a 14 2240*1400 display and which ships with a keyboard to make it essentially a laptop with detachable keyboard. Conclusion At this time I wouldn t recommend that this device be used as a workstation or laptop, although the people who want to do such things will probably do it anyway regardless of my recommendations. I think it will be very useful as a test system for RISC-V development. I have some friends who are interested in this sort of thing and I can give them VMs. It is a bit expensive. The Sipheed web site boasts about the LicheePi4 being faster than the RaspberryPi4, but it s not a lot faster and the RaspberryPi4 is much cheaper ($127 or $129 for one with 8G of RAM). The RaspberryPi4 has two HDMI ports but a limit of 8G of RAM while the LicheePi has up to 16G of RAM and two Gigabit Ethernet ports but only a single HDMI port. It seems that the RaspberryPi4 might win if you want a cheap low power desktop system. At this time I think the reason for this device is testing out RISC-V as an alternative to the AMD64 and ARM64 architectures. An open CPU architecture goes well with free software, but it isn t just people who are into FOSS who are testing such things. I know some corporations are trying out RISC-V as a way of getting other options for embedded systems that don t involve paying monopolists. The Lichee Console 4A is probably a usable tiny laptop if the resolution is sufficient for your needs. As an aside I predict that the tiny laptop or pocket computer segment will take off in the near future. There are some AMD64 systems the size of a phone but thicker that run Windows and go for reasonable prices on AliExpress. Hopefully in the near future this device will have better video drivers and be usable as a small and quiet workstation. I won t rule out the possibility of making this my main workstation in the not too distant future, all it needs is reliable 4K display and the ability to decode 4K video. It s performance for web browsing and as an ssh client seems adequate, and that s what matters for my workstation use. But for the moment it s just for server use.

16 January 2024

Russ Allbery: Review: Making Money

Review: Making Money, by Terry Pratchett

Series:	Discworld #36
Publisher:	Harper
Copyright:	October 2007
Printing:	November 2014
ISBN:	0-06-233499-9
Format:	Mass market
Pages:	473

Making Money is the 36th Discworld novel, the second Moist von Lipwig book, and a direct sequel to Going Postal. You could start the series with Going Postal, but I would not start here. The post office is running like a well-oiled machine, Adora Belle is out of town, and Moist von Lipwig is getting bored. It's the sort of boredom that has him picking his own locks, taking up Extreme Sneezing, and climbing buildings at night. He may not realize it, but he needs something more dangerous to do. Vetinari has just the thing. The Royal Bank of Ankh-Morpork, unlike the post office before Moist got to it, is still working. It is a stolid, boring institution doing stolid, boring things for rich people. It is also the battleground for the Lavish family past-time: suing each other and fighting over money. The Lavishes are old money, the kind of money carefully entangled in trusts and investments designed to ensure the family will always have money regardless of how stupid their children are. Control of the bank is temporarily in the grasp of Joshua Lavish's widow Topsy, who is not a true Lavish, but the vultures are circling. Meanwhile, Vetinari has grand city infrastructure plans, and to carry them out he needs financing. That means he needs a functional bank, and preferably one that is much less conservative. Moist is dubious about running a bank, and even more reluctant when Topsy Lavish sees him for exactly the con artist he is. His hand is forced when she dies, and Moist discovers he has inherited her dog, Mr. Fusspot. A dog that now owns 51% of the Royal Bank and therefore is the chairman of the bank's board of directors. A dog whose safety is tied to Moist's own by way of an expensive assassination contract. Pratchett knew he had a good story with Going Postal, so here he runs the same formula again. And yes, I was happy to read it again. Moist knows very little about banking but quite a lot about pretending something will work until it does, which has more to do with banking than it does with running a post office. The bank employs an expert, Mr. Bent, who is fanatically devoted to the gold standard and the correctness of the books and has very little patience for Moist. There are golem-related hijinks. The best part of this book is Vetinari, who is masterfully manipulating everyone in the story and who gets in some great lines about politics.

"We are not going to have another wretched empire while I am Patrician. We've only just got over the last one."

Also, Vetinari processing dead letters in the post office was an absolute delight. Making Money does have the recurring Pratchett problem of having a fairly thin plot surrounded by random... stuff. Moist's attempts to reform the city currency while staying ahead of the Lavishes is only vaguely related to Mr. Bent's plot arc. The golems are unrelated to the rest of the plot other than providing a convenient deus ex machina. There is an economist making water models in the bank basement with an Igor, which is a great gag but has essentially nothing to do with the rest of the book. One of the golems has been subjected to well-meaning older ladies and 1950s etiquette manuals, which I thought was considerably less funny (and somewhat creepier) than Pratchett did. There are (sigh) clowns, which continue to be my least favorite Ankh-Morpork world-building element. At least the dog was considerably less annoying than I was afraid it was going to be. This grab-bag randomness is a shame, since I think there was room here for a more substantial plot that engaged fully with the high weirdness of finance. Unfortunately, this was a bit like the post office in Going Postal: Pratchett dives into the subject just enough to make a few wry observations and a few funny quips, and then resolves the deeper issues off-camera. Moist tries to invent fiat currency, because of course he does, and Pratchett almost takes on the gold standard, only to veer away at the last minute into vigorous hand-waving. I suspect part of the problem is that I know a little bit too much about finance, so I kept expecting Pratchett to take the humorous social commentary a couple of levels deeper. On a similar note, the villains have great potential that Pratchett undermines by adding too much over-the-top weirdness. I wish Cosmo Lavish had been closer to what he appears to be at the start of the book: a very wealthy and vindictive man (and a reference to Cosimo de Medici) who doesn't have Moist's ability to come up with wildly risky gambits but who knows considerably more than he does about how banking works. Instead, Pratchett gives him a weird obsession that slowly makes him less sinister and more pathetic, which robs the book of a competent antagonist for Moist. The net result is still a fun book, and a solid Discworld entry, but it lacks the core of the best series entries. It felt more like a skit comedy show than a novel, but it's an excellent skit comedy show with the normal assortment of memorable Pratchettisms. Certainly if you've read this far, or even if you've only read Going Postal, you'll want to read Making Money as well. Followed by Unseen Academicals. The next Moist von Lipwig book is Raising Steam. Rating: 8 out of 10

15 January 2024

Russ Allbery: Review: The Library of Broken Worlds

Review: The Library of Broken Worlds, by Alaya Dawn Johnson

Publisher:	Scholastic Press
Copyright:	June 2023
ISBN:	1-338-29064-9
Format:	Kindle
Pages:	446

The Library of Broken Worlds is a young-adult far-future science fantasy. So far as I can tell, it's stand-alone, although more on that later in the review. Freida is the adopted daughter of Nadi, the Head Librarian, and her greatest wish is to become a librarian herself. When the book opens, she's a teenager in highly competitive training. Freida is low-wetware, without the advanced and expensive enhancements of many of the other students competing for rare and prized librarian positions, which she makes up for by being the most audacious. She doesn't need wetware to commune with the library material gods. If one ventures deep into their tunnels and consumes their crystals, direct physical communion is possible. The library tunnels are Freida's second home, in part because that's where she was born. She was created by the Library, and specifically by Iemaja, the youngest of the material gods. Precisely why is a mystery. To Nadi, Freida is her daughter. To Quinn, Nadi's main political rival within the library, Freida is a thing, a piece of the library, a secondary and possibly rogue AI. A disruptive annoyance. The Library of Broken Worlds is the sort of science fiction where figuring out what is going on is an integral part of the reading experience. It opens with a frame story of an unnamed girl (clearly Freida) waking the god Nameren and identifying herself as designed for deicide. She provokes Nameren's curiosity and offers an Arabian Nights bargain: if he wants to hear her story, he has to refrain from killing her for long enough for her to tell it. As one might expect, the main narrative doesn't catch up to the frame story until the very end of the book. The Library is indeed some type of library that librarians can search for knowledge that isn't available from more mundane sources, but Freida's personal experience of it is almost wholly religious and oracular. The library's material gods are identified as AIs, but good luck making sense of the story through a science fiction frame, even with a healthy allowance for sufficiently advanced technology being indistinguishable from magic. The symbolism and tone is entirely fantasy, and late in the book it becomes clear that whatever the material gods are, they're not simple technological AIs in the vein of, say, Banks's Ship Minds. Also, the Library is not solely a repository of knowledge. It is the keeper of an interstellar peace. The Library was founded after the Great War, to prevent a recurrence. It functions as a sort of legal system and grand tribunal in ways that are never fully explained. As you might expect, that peace is based more on stability than fairness. Five of the players in this far future of humanity are the Awilu, the most advanced society and the first to leave Earth (or Tierra as it's called here); the Mah m, who possess the material war god Nameren of the frame story; the Lunars and Martians, who dominate the Sol system; and the surviving Tierrans, residents of a polluted and struggling planet that is ruthlessly exploited by the Lunars. The problem facing Freida and her friends at the start of the book is a petition brought by a young Tierran against Lunar exploitation of his homeland. His name is Joshua, and Freida is more than half in love with him. Joshua's legal argument involves interpretation of the freedom node of the treaty that ended the Great War, a node that precedent says gives the Lunars the freedom to exploit Tierra, but which Joshua claims has a still-valid originalist meaning granting Tierrans freedom from exploitation. There is, in short, a lot going on in this book, and "never fully explained" is something of a theme. Freida is telling a story to Nameren and only explains things Nameren may not already know. The reader has to puzzle out the rest from the occasional hint. This is made more difficult by the tendency of the material gods to communicate only in visions or guided hallucinations, full of symbolism that the characters only partly explain to the reader. Nonetheless, this did mostly work, at least for me. I started this book very confused, but by about the midpoint it felt like the background was coming together. I'm still not sure I understand the aurochs, baobab, and cicada symbolism that's so central to the framing story, but it's the pleasant sort of stretchy confusion that gives my brain a good workout. I wish Johnson had explained a few more things plainly, particularly near the end of the book, but my remaining level of confusion was within my tolerances. Unfortunately, the ending did not work for me. The first time I read it, I had no idea what it meant. Lots of baffling, symbolic things happened and then the book just stopped. After re-reading the last 10%, I think all the pieces of an ending and a bit of an explanation are there, but it's absurdly abbreviated. This is another book where the author appears to have been finished with the story before I was. This keeps happening to me, so this probably says something more about me than it says about books, but I want books to have an ending. If the characters have fought and suffered through the plot, I want them to have some space to be happy and to see how their sacrifices play out, with more detail than just a few vague promises. If much of the book has been puzzling out the nature of the world, I would like some concrete confirmation of at least some of my guesswork. And if you're going to end the book on radical transformation, I want to see the results of that transformation. Johnson does an excellent job showing how brutal the peace of the powerful can be, and is willing to light more things on fire over the course of this book than most authors would, but then doesn't offer the reader much in the way of payoff. For once, I wish this stand-alone turned out to be a series. I think an additional book could be written in the aftermath of this ending, and I would definitely read that novel. Johnson has me caring deeply about these characters and fascinated by the world background, and I'd happily spend another 450 pages finding out what happens next. But, frustratingly, I think this ending was indeed intended to wrap up the story. I think this book may fall between a few stools. Science fiction readers who want mysterious future worlds to be explained by the end of the book are going to be frustrated by the amount of symbolism, allusion, and poetic description. Literary fantasy readers, who have a higher tolerance for that style, are going to wish for more focused and polished writing. A lot of the story is firmly YA: trying and failing to fit in, developing one's identity, coming into power, relationship drama, great betrayals and regrets, overcoming trauma and abuse, and unraveling lies that adults tell you. But this is definitely not a straight-forward YA plot or world background. It demands a lot from the reader, and while I am confident many teenage readers would rise to that challenge, it seems like an awkward fit for the YA marketing category. About 75% of the way in, I would have told you this book was great and you should read it. The ending was a let-down and I'm still grumpy about it. I still think it's worth your attention if you're in the mood for a sink-or-swim type of reading experience. Just be warned that when the ride ends, I felt unceremoniously dumped on the pavement. Content warnings: Rape, torture, genocide. Rating: 7 out of 10

12 January 2024

Dirk Eddelbuettel: digest 0.6.34 on CRAN: Maintanance

Release 0.6.34 of the digest package arrived at CRAN today and has also been uploaded to Debian already. digest creates hash digests of arbitrary R objects. It can use a number different hashing algorithms (md5, sha-1, sha-256, sha-512, crc32, xxhash32, xxhash64, murmur32, spookyhash, blake3, and crc32c), and ebales easy comparison of (potentially large and nested) R language objects as it relies on the native serialization in R. It is a mature and widely-used package (with 63.8 million downloads just on the partial cloud mirrors of CRAN which keep logs) as many tasks may involve caching of objects for which it provides convenient general-purpose hash key generation to quickly identify the various objects. (Oh and we also just passed the 20th anniversary of the initial CRAN upload. Time flies, as they say.) This release contains small (build-focussed) enhancements contributed by Michael Chirico, and another set of fixed for printf format warnings this time on Windows. My CRANberries provides a summary of changes to the previous version. For questions or comments use the issue tracker off the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

11 January 2024

Matthias Klumpp: Wayland really breaks things Just for now?

This post is in part a response to an aspect of Nate s post Does Wayland really break everything? , but also my reflection on discussing Wayland protocol additions, a unique pleasure that I have been involved with for the past months¹.

Some facts Before I start I want to make a few things clear: The Linux desktop will be moving to Wayland² this is a fact at this point (and has been for a while), sticking to X11 makes no sense for future projects. From reading Wayland protocols and working with it at a much lower level than I ever wanted to, it is also very clear to me that Wayland is an exceptionally well-designed core protocol, and so are the additional extension protocols (xdg-shell & Co.). The modularity of Wayland is great, it gives it incredible flexibility and will for sure turn out to be good for the long-term viability of this project (and also provides a path to correct protocol issues in future, if one is found). In other words: Wayland is an amazing foundation to build on, and a lot of its design decisions make a lot of sense! The shift towards people seeing Linux more as an application developer platform, and taking PipeWire and XDG Portals into account when designing for Wayland is also an amazing development and I love to see this this holistic approach is something I always wanted! Furthermore, I think Wayland removes a lot of functionality that shouldn t exist in a modern compositor and that s a good thing too! Some of X11 s features and design decisions had clear drawbacks that we shouldn t replicate. I highly recommend to read Nate s blog post, it s very good and goes into more detail. And due to all of this, I firmly believe that any advancement in the Wayland space must come from within the project.

But! But! Of course there was a but coming I think while developing Wayland-as-an-ecosystem we are now entrenched into narrow concepts of how a desktop should work. While discussing Wayland protocol additions, a lot of concepts clash, people from different desktops with different design philosophies debate the merits of those over and over again never reaching any conclusion (just as you will never get an answer out of humans whether sushi or pizza is the clearly superior food, or whether CSD or SSD is better). Some people want to use Wayland as a vehicle to force applications to submit to their desktop s design philosophies, others prefer the smallest and leanest protocol possible, other developers want the most elegant behavior possible. To be clear, I think those are all very valid approaches. But this also creates problems: By switching to Wayland compositors, we are already forcing a lot of porting work onto toolkit developers and application developers. This is annoying, but just work that has to be done. It becomes frustrating though if Wayland provides toolkits with absolutely no way to reach their goal in any reasonable way. For Nate s Photoshop analogy: Of course Linux does not break Photoshop, it is Adobe s responsibility to port it. But what if Linux was missing a crucial syscall that Photoshop needed for proper functionality and Adobe couldn t port it without that? In that case it becomes much less clear on who is to blame for Photoshop not being available. A lot of Wayland protocol work is focused on the environment and design, while applications and work to port them often is considered less. I think this happens because the overlap between application developers and developers of the desktop environments is not necessarily large, and the overlap with people willing to engage with Wayland upstream is even smaller. The combination of Windows developers porting apps to Linux and having involvement with toolkits or Wayland is pretty much nonexistent. So they have less of a voice.

A quick detour through the neuroscience research lab I have been involved with Freedesktop, GNOME and KDE for an incredibly long time now (more than a decade), but my actual job (besides consulting for Purism) is that of a PhD candidate in a neuroscience research lab (working on the morphology of biological neurons and its relation to behavior). I am mostly involved with three research groups in our institute, which is about 35 people. Most of us do all our data analysis on powerful servers which we connect to using RDP (with KDE Plasma as desktop). Since I joined, I have been pushing the envelope a bit to extend Linux usage to data acquisition and regular clients, and to have our data acquisition hardware interface well with it. Linux brings some unique advantages for use in research, besides the obvious one of having every step of your data management platform introspectable with no black boxes left, a goal I value very highly in research (but this would be its own blogpost). In terms of operating system usage though, most systems are still Windows-based. Windows is what companies develop for, and what people use by default and are familiar with. The choice of operating system is very strongly driven by application availability, and WSL being really good makes this somewhat worse, as it removes the need for people to switch to a real Linux system entirely if there is the occasional software requiring it. Yet, we have a lot more Linux users than before, and use it in many places where it makes sense. I also developed a novel data acquisition software that even runs on Linux-only and uses the abilities of the platform to its fullest extent. All of this resulted in me asking existing software and hardware vendors for Linux support a lot more often. Vendor-customer relationship in science is usually pretty good, and vendors do usually want to help out. Same for open source projects, especially if you offer to do Linux porting work for them But overall, the ease of use and availability of required applications and their usability rules supreme. Most people are not technically knowledgeable and just want to get their research done in the best way possible, getting the best results with the least amount of friction.
KDE/Linux usage at a control station for a particle accelerator at Adlershof Technology Park, Germany, for reference (by 25years of KDE)³

Back to the point The point of that story is this: GNOME, KDE, RHEL, Debian or Ubuntu: They all do not matter if the necessary applications are not available for them. And as soon as they are, the easiest-to-use solution wins. There are many facets of easiest : In many cases this is RHEL due to Red Hat support contracts being available, in many other cases it is Ubuntu due to its mindshare and ease of use. KDE Plasma is also frequently seen, as it is perceived a bit easier to onboard Windows users with it (among other benefits). Ultimately, it comes down to applications and 3rd-party support though. Here s a dirty secret: In many cases, porting an application to Linux is not that difficult. The thing that companies (and FLOSS projects too!) struggle with and will calculate the merits of carefully in advance is whether it is worth the support cost as well as continuous QA/testing. Their staff will have to do all of that work, and they could spend that time on other tasks after all. So if they learn that porting to Linux not only means added testing and support, but also means to choose between the legacy X11 display server that allows for 1:1 porting from Windows or the new Wayland compositors that do not support the same features they need, they will quickly consider it not worth the effort at all. I have seen this happen. Of course many apps use a cross-platform toolkit like Qt, which greatly simplifies porting. But this just moves the issue one layer down, as now the toolkit needs to abstract Windows, macOS and Wayland. And Wayland does not contain features to do certain things or does them very differently from e.g. Windows, so toolkits have no way to actually implement the existing functionality in a way that works on all platforms. So in Qt s documentation you will often find texts like works everywhere except for on Wayland compositors or mobile ⁴. Many missing bits or altered behavior are just papercuts, but those add up. And if users will have a worse experience, this will translate to more support work, or people not wanting to use the software on the respective platform.

What s missing?

Window positioning SDI applications with multiple windows are very popular in the scientific world. For data acquisition (for example with microscopes) we often have one monitor with control elements and one larger one with the recorded image. There is also other configurations where multiple signal modalities are acquired, and the experimenter aligns windows exactly in the way they want and expects the layout to be stored and to be loaded upon reopening the application. Even in the image from Adlershof Technology Park above you can see this style of UI design, at mega-scale. Being able to pop-out elements as windows from a single-window application to move them around freely is another frequently used paradigm, and immensely useful with these complex apps. It is important to note that this is not a legacy design, but in many cases an intentional choice these kinds of apps work incredibly well on larger screens or many screens and are very flexible (you can have any window configuration you want, and switch between them using the (usually) great window management abilities of your desktop). Of course, these apps will work terribly on tablets and small form factors, but that is not the purpose they were designed for and nobody would use them that way. I assumed for sure these features would be implemented at some point, but when it became clear that that would not happen, I created the ext-placement protocol which had some good discussion but was ultimately rejected from the `xdg` namespace. I then tried another solution based on feedback, which turned out not to work for most apps, and now proposed xdg-placement (v2) in an attempt to maybe still get some protocol done that we can agree on, exploring more options before pushing the existing protocol for inclusion into the `ext` Wayland protocol namespace. Meanwhile though, we can not port any application that needs this feature, while at the same time we are switching desktops and distributions to Wayland by default.

Window position restoration Similarly, a protocol to save & restore window positions was already proposed in 2018, 6 years ago now, but it has still not been agreed upon, and may not even help multiwindow apps in its current form. The absence of this protocol means that applications can not restore their former window positions, and the user has to move them to their previous place again and again. Meanwhile, toolkits can not adopt these protocols and applications can not use them and can not be ported to Wayland without introducing papercuts.

Window icons Similarly, individual windows can not set their own icons, and not-installed applications can not have an icon at all because there is no desktop-entry file to load the icon from and no icon in the theme for them. You would think this is a niche issue, but for applications that create many windows, providing icons for them so the user can find them is fairly important. Of course it s not the end of the world if every window has the same icon, but it s one of those papercuts that make the software slightly less user-friendly. Even applications with fewer windows like LibrePCB are affected, so much so that they rather run their app through Xwayland for now. I decided to address this after I was working on data analysis of image data in a Python virtualenv, where my code and the Python libraries used created lots of windows all with the default yellow W icon, making it impossible to distinguish them at a glance. This is xdg-toplevel-icon now, but of course it is an uphill battle where the very premise of needing this is questioned. So applications can not use it yet.

Limited window abilities requiring specialized protocols Firefox has a picture-in-picture feature, allowing it to pop out media from a mediaplayer as separate floating window so the user can watch the media while doing other things. On X11 this is easily realized, but on Wayland the restrictions posed on windows necessitate a different solution. The xdg-pip protocol was proposed for this specialized usecase, but it is also not merged yet. So this feature does not work as well on Wayland.

Automated GUI testing / accessibility / automation Automation of GUI tasks is a powerful feature, so is the ability to auto-test GUIs. This is being worked on, with libei and wlheadless-run (and stuff like ydotool exists too), but we re not fully there yet.

Wayland is frustrating for (some) application authors As you see, there is valid applications and valid usecases that can not be ported yet to Wayland with the same feature range they enjoyed on X11, Windows or macOS. So, from an application author s perspective, Wayland does break things quite significantly, because things that worked before can no longer work and Wayland (the whole stack) does not provide any avenue to achieve the same result. Wayland does break screen sharing, global hotkeys, gaming latency (via no tearing ) etc, however for all of these there are solutions available that application authors can port to. And most developers will gladly do that work, especially since the newer APIs are usually a lot better and more robust. But if you give application authors no path forward except use Xwayland and be on emulation as second-class citizen forever , it just results in very frustrated application developers. For some application developers, switching to a Wayland compositor is like buying a canvas from the Linux shop that forces your brush to only draw triangles. But maybe for your avant-garde art, you need to draw a circle. You can approximate one with triangles, but it will never be as good as the artwork of your friends who got their canvases from the Windows or macOS art supply shop and have more freedom to create their art.

Triangles are proven to be the best shape! If you are drawing circles you are creating bad art! Wayland, via its protocol limitations, forces a certain way to build application UX often for the better, but also sometimes to the detriment of users and applications. The protocols are often fairly opinionated, a result of the lessons learned from X11. In any case though, it is the odd one out Windows and macOS do not pose the same limitations (for better or worse!), and the effort to port to Wayland is orders of magnitude bigger, or sometimes in case of the multiwindow UI paradigm impossible to achieve to the same level of polish. Desktop environments of course have a design philosophy that they want to push, and want applications to integrate as much as possible (same as macOS and Windows!). However, there are many applications out there, and pushing a design via protocol limitations will likely just result in fewer apps.

The porting dilemma I spent probably way too much time looking into how to get applications cross-platform and running on Linux, often talking to vendors (FLOSS and proprietary) as well. Wayland limitations aren t the biggest issue by far, but they do start to come come up now, especially in the scientific space with Ubuntu having switched to Wayland by default. For application authors there is often no way to address these issues. Many scientists do not even understand why their Python script that creates some GUIs suddenly behaves weirdly because Qt is now using the Wayland backend on Ubuntu instead of X11. They do not know the difference and also do not want to deal with these details even though they may be programmers as well, the real goal is not to fiddle with the display server, but to get to a scientific result somehow. Another issue is portability layers like Wine which need to run Windows applications as-is on Wayland. Apparently Wine s Wayland driver has some heuristics to make window positioning work (and I am amazed by the work done on this!), but that can only go so far.

A way out? So, how would we actually solve this? Fundamentally, this excessively long blog post boils down to just one essential question: Do we want to force applications to submit to a UX paradigm unconditionally, potentially loosing out on application ports or keeping apps on X11 eternally, or do we want to throw them some rope to get as many applications ported over to Wayland, even through we might sacrifice some protocol purity? I think we really have to answer that to make the discussions on wayland-protocols a lot less grueling. This question can be answered at the wayland-protocols level, but even more so it must be answered by the individual desktops and compositors. If the answer for your environment turns out to be Yes, we want the Wayland protocol to be more opinionated and will not make any compromises for application portability , then your desktop/compositor should just immediately NACK protocols that add something like this and you simply shouldn t engage in the discussion, as you reject the very premise of the new protocol: That it has any merit to exist and is needed in the first place. In this case contributors to Wayland and application authors also know where you stand, and a lot of debate is skipped. Of course, if application authors want to support your environment, you are basically asking them now to rewrite their UI, which they may or may not do. But at least they know what to expect and how to target your environment. If the answer turns out to be We do want some portability , the next question obviously becomes where the line should be drawn and which changes are acceptable and which aren t. We can t blindly copy all X11 behavior, some porting work to Wayland is simply inevitable. Some written rules for that might be nice, but probably more importantly, if you agree fundamentally that there is an issue to be fixed, please engage in the discussions for the respective MRs! We for sure do not want to repeat X11 mistakes, and I am certain that we can implement protocols which provide the required functionality in a way that is a nice compromise in allowing applications a path forward into the Wayland future, while also being as good as possible and improving upon X11. For example, the toplevel-icon proposal is already a lot better than anything X11 ever had. Relaxing ACK requirements for the ext namespace is also a good proposed administrative change, as it allows some compositors to add features they want to support to the shared repository easier, while also not mandating them for others. In my opinion, it would allow for a lot less friction between the two different ideas of how Wayland protocol development should work. Some compositors could move forward and support more protocol extensions, while more restrictive compositors could support less things. Applications can detect supported protocols at launch and change their behavior accordingly (ideally even abstracted by toolkits). You may now say that a lot of apps are ported, so surely this issue can not be that bad. And yes, what Wayland provides today may be enough for 80-90% of all apps. But what I hope the detour into the research lab has done is convince you that this smaller percentage of apps matters. A lot. And that it may be worthwhile to support them. To end on a positive note: When it came to porting concrete apps over to Wayland, the only real showstoppers so far⁵ were the missing window-positioning and window-position-restore features. I encountered them when porting my own software, and I got the issue as feedback from colleagues and fellow engineers. In second place was UI testing and automation support, the window-icon issue was mentioned twice, but being a cosmetic issue it likely simply hurts people less and they can ignore it easier. What this means is that the majority of apps are already fine, and many others are very, very close! A Wayland future for everyone is within our grasp! I will also bring my two protocol MRs to their conclusion for sure, because as application developers we need clarity on what the platform (either all desktops or even just a few) supports and will or will not support in future. And the only way to get something good done is by contribution and friendly discussion.

Footnotes
Apologies for the clickbait-y title it comes with the subject
When I talk about Wayland I mean the combined set of display server protocols and accepted protocol extensions, unless otherwise clarified.
I would have picked a picture from our lab, but that would have needed permission first
Qt has awesome platform issues pages, like for macOS and Linux/X11 which help with porting efforts, but Qt doesn t even list Linux/Wayland as supported platform. There is some information though, like window geometry peculiarities, which aren t particularly helpful when porting (but still essential to know).
Besides issues with Nvidia hardware CUDA for simulations and machine-learning is pretty much everywhere, so Nvidia cards are common, which causes trouble on Wayland still. It is improving though.

4 January 2024

Aigars Mahinovs: Figuring out finances part 5

At the end of the last part of this, we got a Home Assistant OS installation that contains in itself a Firefly III instance and that contains all the current financial information. They are communicating and calculating predictions for me. The only part that I was not 100% happy with was accounting of cash transactions. You see payments in cash are mostly made away from computer and sometimes even in areas without a mobile Internet connection. And all Firefly III apps that I tried failed at the task of creating a new transaction when offline. Even the one recommended Telegram bot from Firefly III page used a dialog-based approach for creating even a simplest transaction. Issue asking for a one-shot transaction creation option stands as unresolved. Theoretically it would have been best if I could simply contribute that feature to that particular Telegram bot ... but it's written in Javascript. By mapping the API onto tasks somehow. After about 4 hours I still could not figure out where in the code anything actually happens. It all looked like just sugar or spagetty. Connectors on connectors on mappers. So I did the real open-source thing and just wrote my own tool. firefly3_telegram_oneshot is a maximally simple Telegram bot based on python-telegram-bot library. So what does it do? The primary usage for me is to simply send a message to the bot at any time with text like "23.2 coffee and cake" and when the message eventually reaches the bot, it then should create a new transaction from my "cash" account to "Unknown" account in amount of 23.20 and description "coffee and cake". That is the key. Everything else in the bot is comfort. For example "/undo" command deletes the last transaction in cash account (presumably added by error) and "/last" shows which transaction the "/undo" would delete. And to help with expense categorisation one can also do a message like "6.1 beer, dest=Edeka, cat=alcohol" that would search for a destination account that would fuzzy match to "Edeka" (a supermarket in Germany) and add the transation to the category fuzzy matched to "alcohol", like "Shopping - alcoholic drinks". And to make that fuzzy matching more reliable I also added "/cat something" and "/dest something" commands that would show which category or destination account would be matched with a given string. All of that in around 250 lines of Python code and executed by a 17 line Dockerfile (via the Portainer on my Home Assistant OS). One remaining function that could be nice is creating a category or destination account on request (for example when the first character of the supplied string is "+"). I am really plesantly surprised about how much can be done with how little code using the above Python library. And you never need to have any open incoming ports anywhere to runs such bots, so the attack surface for such bot-based service is much tighter. All in all the system works and works well. The only exception is that for my particual bank there is still no automatic way of extracting data about credit card transactions. For those I still have to manually log into the Internet bank, export a CSV file and then feed that into the Firefly III importer. Annoying. And I am not really motivated to try to hack my bank :D Has this been useful to any of you? Any ideas to expand or improve what I have? Just find me as "aigarius" on any social media and speak up :)

3 January 2024

John Goerzen: Consider Security First

I write this in the context of my decision to ditch Raspberry Pi OS and move everything I possibly can, including my Raspberry Pi devices, to Debian. I will write about that later. But for now, I wanted to comment on something I think is often overlooked and misunderstood by people considering distributions or operating systems: the huge importance of getting security updates in an automated and easy way.

Background Let s assume that these statements are true, which I think are well-supported by available evidence:

Every computer system (OS plus applications) that can do useful modern work has security vulnerabilities, some of which are unknown at any given point in time;

During the lifetime of that computer system, some of these vulnerabilities will be discovered. For a (hopefully large) subset of those vulnerabilities, timely patches will become available.

Now then, it follows that applying those timely patches is a critical part of having a system that it as secure as possible. Of course, you have to do other things as well good passwords, secure practices, etc but, fundamentally, if your system lacks patches for known vulnerabilities, you ve already lost at the security ballgame.

How to stay patched There is something of a continuum of how you might patch your system. It runs roughly like this, from best to worst:

All components are kept up-to-date automatically, with no intervention from the user/operator

The operator is automatically alerted to necessary patches, and they can be easily installed with minimal intervention

The operator is automatically alerted to necessary patches, but they require significant effort to apply

The operator has no way to detect vulnerabilities or necessary patches

It should be obvious that the first situation is ideal. Every other situation relies on the timeliness of human action to keep up-to-date with security patches. This is a fallible situation; humans are busy, take trips, dismiss alerts, miss alerts, etc. That said, it is rare to find any system living truly all the way in that scenario, as you ll see.

What is your system ? A critical point here is: what is your system ? It includes:

Your kernel

Your base operating system

Your applications

All the libraries needed to run all of the above

Some OSs, such as Debian, make little or no distinction between the base OS and the applications. Others, such as many BSDs, have a distinction there. And in some cases, people will compile or install applications outside of any OS mechanism. (It must be stressed that by doing so, you are taking the responsibility of patching them on your own shoulders.)

How do common systems stack up?

Debian, with its support for unattended-upgrades, needrestart, debian-security-support, and such, is largely category 1. It can automatically apply security patches, in most cases can restart the necessary services for the patch to take effect, and will alert you when some processes or the system must be manually restarted for a patch to take effect (for instance, a kernel update). Those cases requiring manual intervention are category 2. The debian-security-support package will even warn you of gaps in the system. You can also use debsecan to scan for known vulnerabilities on a given installation.

FreeBSD has no way to automatically install security patches for things in the packages collection. As with many rolling-release systems, you can t automate the installation of these security patches with FreeBSD because it is not safe to blindly update packages. It s not safe to blindly update packages because they may bring along more than just security patches: they may represent major upgrades that introduce incompatibilities, etc. Unlike Debian s practice of backporting fixes and thus producing narrowly-tailored patches, forcing upgrades to newer versions precludes a minimal intervention install. Therefore, rolling release systems are category 3.

Things such as Snap, Flatpak, AppImage, Docker containers, Electron apps, and third-party binaries often contain embedded libraries and such for which you have no easy visibility into their status. For instance, if there was a bug in libpng, would you know how many of your containers had a vulnerability? These systems are category 4 you don t even know if you re vulnerable. It s for this reason that my Debian-based Docker containers apply security patches before starting processes, and also run unattended-upgrades and friends.

The pernicious library problem As mentioned in my last category above, hidden vulnerabilities can be a big problem. I ve been writing about this for years. Back in 2017, I wrote an article focused on Docker containers, but which applies to the other systems like Snap and so forth. I cited a study back then that Over 80% of the :latest versions of official images contained at least one high severity vulnerability. The situation is no better now. In December 2023, it was reported that, two years after the critical Log4Shell vulnerability, 25% of apps were still vulnerable to it. Also, only 21% of developers ever update third-party libraries after introducing them into their projects. Clearly, you can t rely on these images with embedded libraries to be secure. And since they are black box, they are difficult to audit. Debian s policy of always splitting libraries out from packages is hugely beneficial; it allows finegrained analysis of not just vulnerabilities, but also the dependency graph. If there s a vulnerability in libpng, you have one place to patch it and you also know exactly what components of your system use it. If you use snaps, or AppImages, you can t know if they contain a deeply embedded vulnerability, nor could you patch it yourself if you even knew. You are at the mercy of upstream detecting and remedying the problem a dicey situation at best.

Who makes the patches? Fundamentally, humans produce security patches. Often, but not always, patches originate with the authors of a program and then are integrated into distribution packages. It should be noted that every security team has finite resources; there will always be some CVEs that aren t patched in a given system for various reasons; perhaps they are not exploitable, or are too low-impact, or have better mitigations than patches. Debian has an excellent security team; they manage the process of integrating patches into Debian, produce Debian Security Advisories, maintain the Debian Security Tracker (which maintains cross-references with the CVE database), etc. Some distributions don t have this infrastructure. For instance, I was unable to find this kind of tracker for Devuan or Raspberry Pi OS. In contrast, Ubuntu and Arch Linux both seem to have active security teams with trackers and advisories.

Implications for Raspberry Pi OS and others As I mentioned above, I m transitioning my Pi devices off Raspberry Pi OS (Raspbian). Security is one reason. Although Raspbian is a fork of Debian, and you can install packages like unattended-upgrades on it, they don t work right because they use the Debian infrastructure, and Raspbian hasn t modified them to use their own infrastructure. I don t see any Raspberry Pi OS security advisories, trackers, etc. In short, they lack the infrastructure to support those Debian tools anyhow. Not only that, but Raspbian lags behind Debian in both new releases and new security patches, sometimes by days or weeks. Live Migrating from Raspberry Pi OS bullseye to Debian bookworm contains instructions for migrating Raspberry Pis to Debian.

2 January 2024

Matthew Garrett: Dealing with weird ELF libraries

Libraries are collections of code that are intended to be usable by multiple consumers (if you're interested in the etymology, watch this video). In the old days we had what we now refer to as "static" libraries, collections of code that existed on disk but which would be copied into newly compiled binaries. We've moved beyond that, thankfully, and now make use of what we call "dynamic" or "shared" libraries - instead of the code being copied into the binary, a reference to the library function is incorporated, and at runtime the code is mapped from the on-disk copy of the shared object[1]. This allows libraries to be upgraded without needing to modify the binaries using them, and if multiple applications are using the same library at once it only requires that one copy of the code be kept in RAM.

But for this to work, two things are necessary: when we build a binary, there has to be a way to reference the relevant library functions in the binary; and when we run a binary, the library code needs to be mapped into the process.

(I'm going to somewhat simplify the explanations from here on - things like symbol versioning make this a bit more complicated but aren't strictly relevant to what I was working on here)

For the first of these, the goal is to replace a call to a function (eg, printf()) with a reference to the actual implementation. This is the job of the linker rather than the compiler (eg, if you use the -c argument to tell gcc to simply compile to an object rather than linking an executable, it's not going to care about whether or not every function called in your code actually exists or not - that'll be figured out when you link all the objects together), and the linker needs to know which symbols (which aren't just functions - libraries can export variables or structures and so on) are available in which libraries. You give the linker a list of libraries, it extracts the symbols available, and resolves the references in your code with references to the library.

But how is that information extracted? Each ELF object has a fixed-size header that contains references to various things, including a reference to a list of "section headers". Each section has a name and a type, but the ones we're interested in are .dynstr and .dynsym. .dynstr contains a list of strings, representing the name of each exported symbol. .dynsym is where things get more interesting - it's a list of structs that contain information about each symbol. This includes a bunch of fairly complicated stuff that you need to care about if you're actually writing a linker, but the relevant entries for this discussion are an index into .dynstr (which means the .dynsym entry isn't sufficient to know the name of a symbol, you need to extract that from .dynstr), along with the location of that symbol within the library. The linker can parse this information and obtain a list of symbol names and addresses, and can now replace the call to printf() with a reference to libc instead.

(Note that it's not possible to simply encode this as "Call this address in this library" - if the library is rebuilt or is a different version, the function could move to a different location)

Experimentally, .dynstr and .dynsym appear to be sufficient for linking a dynamic library at build time - there are other sections related to dynamic linking, but you can link against a library that's missing them. Runtime is where things get more complicated.

When you run a binary that makes use of dynamic libraries, the code from those libraries needs to be mapped into the resulting process. This is the job of the runtime dynamic linker, or RTLD[2]. The RTLD needs to open every library the process requires, map the relevant code into the process's address space, and then rewrite the references in the binary into calls to the library code. This requires more information than is present in .dynstr and .dynsym - at the very least, it needs to know the list of required libraries.

There's a separate section called .dynamic that contains another list of structures, and it's the data here that's used for this purpose. For example, .dynamic contains a bunch of entries of type DT_NEEDED - this is the list of libraries that an executable requires. There's also a bunch of other stuff that's required to actually make all of this work, but the only thing I'm going to touch on is DT_HASH. Doing all this re-linking at runtime involves resolving the locations of a large number of symbols, and if the only way you can do that is by reading a list from .dynsym and then looking up every name in .dynstr that's going to take some time. The DT_HASH entry points to a hash table - the RTLD hashes the symbol name it's trying to resolve, looks it up in that hash table, and gets the symbol entry directly (it still needs to resolve that against .dynstr to make sure it hasn't hit a hash collision - if it has it needs to look up the next hash entry, but this is still generally faster than walking the entire .dynsym list to find the relevant symbol). There's also DT_GNU_HASH which fulfills the same purpose as DT_HASH but uses a more complicated algorithm that performs even better. .dynamic also contains entries pointing at .dynstr and .dynsym, which seems redundant but will become relevant shortly.

So, .dynsym and .dynstr are required at build time, and both are required along with .dynamic at runtime. This seems simple enough, but obviously there's a twist and I'm sorry it's taken so long to get to this point.

I bought a Synology NAS for home backup purposes (my previous solution was a single external USB drive plugged into a small server, which had uncomfortable single point of failure properties). Obviously I decided to poke around at it, and I found something odd - all the libraries Synology ships were entirely lacking any ELF section headers. This meant no .dynstr, .dynsym or .dynamic sections, so how was any of this working? nm asserted that the libraries exported no symbols, and readelf agreed. If I wrote a small app that called a function in one of the libraries and built it, gcc complained that the function was undefined. But executables on the device were clearly resolving the symbols at runtime, and if I loaded them into ghidra the exported functions were visible. If I dlopen()ed them, dlsym() couldn't resolve the symbols - but if I hardcoded the offset into my code, I could call them directly.

Things finally made sense when I discovered that if I passed the --use-dynamic argument to readelf, I did get a list of exported symbols. It turns out that ELF is weirder than I realised. As well as the aforementioned section headers, ELF objects also include a set of program headers. One of the program header types is PT_DYNAMIC. This typically points to the same data that's present in the .dynamic section. Remember when I mentioned that .dynamic contained references to .dynsym and .dynstr? This means that simply pointing at .dynamic is sufficient, there's no need to have separate entries for them.

The same information can be reached from two different locations. The information in the section headers is used at build time, and the information in the program headers at run time[3]. I do not have an explanation for this. But if the information is present in two places, it seems obvious that it should be able to reconstruct the missing section headers in my weird libraries? So that's what this does. It extracts information from the DYNAMIC entry in the program headers and creates equivalent section headers.

There's one thing that makes this more difficult than it might seem. The section header for .dynsym has to contain the number of symbols present in the section. And that information doesn't directly exist in DYNAMIC - to figure out how many symbols exist, you're expected to walk the hash tables and keep track of the largest number you've seen. Since every symbol has to be referenced in the hash table, once you've hit every entry the largest number is the number of exported symbols. This seemed annoying to implement, so instead I cheated, added code to simply pass in the number of symbols on the command line, and then just parsed the output of readelf against the original binaries to extract that information and pass it to my tool.

Somehow, this worked. I now have a bunch of library files that I can link into my own binaries to make it easier to figure out how various things on the Synology work. Now, could someone explain (a) why this information is present in two locations, and (b) why the build-time linker and run-time linker disagree on the canonical source of truth?

[1] "Shared object" is the source of the .so filename extension used in various Unix-style operating systems
[2] You'll note that "RTLD" is not an acryonym for "runtime dynamic linker", because reasons
[3] For environments using the GNU RTLD, at least - I have no idea whether this is the case in all ELF environments

comments

Thomas Koch: Waiting for a STATE folder in the XDG basedir spec

Posted on February 18, 2014

Tags: debian, free software

The XDG Basedirectory specification proposes default homedir folders for the categories DATA (~/.local/share), CONFIG (~/.config) and CACHE (~/.cache). One category however is missing: STATE. This category has been requested several times but nothing happened. Examples for state data are:

history files of shells, repls, anything that uses libreadline
logfiles
state of application windows on exit
recently opened files
last time application was run
emacs: bookmarks, ido last directories, backups, auto-save files, auto-save-list

The missing STATE category is especially annoying if you re managing your dotfiles with a VCS (e.g. via VCSH) and you care to keep your homedir tidy. If you re as annoyed as me about the missing STATE category, please voice your opinion on the XDG mailing list. Of course it s a very long way until applications really use such a STATE directory. But without a common standard it will never happen.

31 December 2023

Chris Lamb: Favourites of 2023

This post should have marked the beginning of my yearly roundups of the favourite books and movies I read and watched in 2023. However, due to coming down with a nasty bout of flu recently and other sundry commitments, I wasn't able to undertake writing the necessary four or five blog posts In lieu of this, however, I will simply present my (unordered and unadorned) highlights for now. Do get in touch if this (or any of my previous posts) have spurred you into picking something up yourself

Books

Films Recent releases

The Blue Caftan (Maryam Touzani, 2022)
The Eight Mountains (Felix van Groeningen & Charlotte Vandermeersch, 2022)
Evil Does Not Exist (Ryusuke Hamaguchi, 2023)
Killers of the Flower Moon (Martin Scorcese, 2023)
Monster (Hirokazu Kore-eda, 2023)
Passages (Ira Sachs, 2023)
Poor Things (Yorgos Lanthimos, 2023)
The Tuba Thieves (Alison O Daniel, 2023)
Theater Camp (Molly Gordon and Nick Lieberman, 2023)
T R (Todd Field, 2022)

Unenjoyable experiences included Alejandro G mez Monteverde's Sound of Freedom (2023), Alex Garland's Men (2022) and Steven Spielberg's The Fabelmans (2022).
Older releases (Films released before 2022, and not including rewatches from previous years.)

Brief Encounter (David Lean, 1945)
Clouds of Sils Maria (Olivier Assayas, 2014)
Daisy Miller (Peter Bogdanovich, 1974)
First Reformed (Paul Schrader, 2017)
Forbidden Games (Ren Cl ment, 1952)
La Noire de... (Ousmane Semb ne, 1966)
The Queen of Spades (Thorold Dickinson, 1949)
The River (Jean Renoir, 1951)
Topsy-Turvy (Mike Leigh, 1999)
Le Trou (Jacques Becker, 1960)

Distinctly unenjoyable watches included Ocean's Eleven (1960), El Topo (1970), L olo (1992), Hotel Mumbai (2018), Bulworth (1998) and and The Big Red One (1980).

Next.

Previous.

Search Results: "gor"

2 March 2024

28 February 2024

25 February 2024

13 February 2024

+1, Helpful If you ve found this post informative, why not buy me a refreshing beverage? My typing fingers (both of them) thank you in advance for your generosity.

8 February 2024

7 February 2024

2 February 2024

29 January 2024

28 January 2024

24 January 2024

Changes in version 0.0.22 (2024-01-23) Replace empty examples macro to satisfy CRAN request.

18 January 2024

16 January 2024

15 January 2024

12 January 2024

11 January 2024

What s missing?

Automated GUI testing / accessibility / automation Automation of GUI tasks is a powerful feature, so is the ability to auto-test GUIs. This is being worked on, with libei and wlheadless-run (and stuff like ydotool exists too), but we re not fully there yet.

4 January 2024

3 January 2024

2 January 2024

31 December 2023

Changes in version 0.0.22 (2024-01-23)

Replace empty examples macro to satisfy CRAN request.